Sample Sheet
Dorado can make use of a MinKNOW-compatible sample sheet containing data used to identify a particular classification of read.
To apply a sample sheet, provide the path to the appropriate CSV file using the --sample-sheet
argument:
dorado basecaller dna_r10.4.1_e8.2_400bps_hac@v4.2.0 reads/ \
--kit-name SQK-16S114-24 \
--sample-sheet <path_to_sample_sheet_csv> \
> calls.bam
A sample sheet can also be applied to the demux
command in the same way:
dorado demux calls.bam \
--output-dir classified_reads \
--kit-name SQK-16S114-24 \
--sample-sheet <path_to_sample_sheet_csv>
Dorado currently uses the sample sheet only for barcode filtering and aliasing,
so a --kit-name
argument is required.
In the case of demux
, the sample sheet must contain a 1-to-1 mapping of barcode
identifiers
to flow_cell_id
/position_id
- i.e. all entries in the barcode
column must be unique.
Specification
Sample sheet column headers
A sample sheet may only contain the column names below:
Purpose | Column Name | Notes |
---|---|---|
Standard | experiment_id [1] |
Required[3] |
kit |
Required | |
flow_cell_id [2] |
Optional if position_id is set |
|
position_id [2] |
Optional if flow_cell_id is set |
|
protocol_run_id |
Optional | |
sample_id |
Optional[3] | |
flow_cell_product_code |
Optional | |
Barcoding | alias [4] |
Optional[3] |
type |
Optional | |
barcode [5] |
Optional |
- All rows in a sample sheet must contain the same
experiment_id
. - At a minimum a sample sheet must contain
kit
,experiment_id
and one ofposition_id
orflow_cell_id
. - These fields must be a maximum of 40 characters, which must be either alphanumeric (
A-Z
,a-z
,0-9
),_
or-
. - See Barcode aliasing
- See Barcode filtering
For a full description of the format of the sample sheet, see the MinKNOW Sample Sheet documentation.
Note
Dorado does not currently support dual barcodes.
Barcode aliasing
If a sample sheet contains an alias
column, this will be used to replace the barcode
identifier for reads matching the flow_cell_id
/position_id
and experiment_id
.
This will be reflected in the read group ID @RG ID
in the file header, and in the
BC
and RG
tags of the classified reads.
Note
If both flow_cell_id
and position_id
are present, both must match the read data for an alias to be applied.
Warning
Values in the alias
column must not be valid barcode identifiers (e.g. barcode##
or unclassified
).
Barcode filtering
If a sample sheet is present and barcoding is requested, Dorado will only attempt to
find matches to the barcode identifiers listed in the barcode
column (if present).