Frequently Asked Questions
Below are some of our most frequently asked technical and support questions. Please check back regularly if you have an issue as this will be updated often.
If you have a question that is not answered below please raise a new issue on the Dorado GitHub issues page providing as much information as possible and the Dorado team will aim to respond promptly.
Basecaller
Models
Please check out the Models Introduction and Models List.
Which model should I use?
Since Dorado 0.5.0, the automatic model selection
algorithm should be able to select the appropriate model for the input data (POD5 only)
given a model speed (e.g. fast, hac, sup
). Dorado will automatically download missing models.
In general, the latest basecalling models will be the most performant and most accurate as there are continuous advances in model architecture and training.
Which model did I use?
Dorado will write meta data to the BAM read group (RG
) header as detailed in the
SAM specification.
The following command can be used to inspect the header and extract the basecall model or modified bases model.
> samtools view -H calls.bam | grep -oE "\S*models?=\S*"
DS:basecall_model=dna_r10.4.1_e8.2_400bps_hac@v5.2.0
modbase_models=dna_r10.4.1_e8.2_400bps_hac@v5.2.0_5mC_5hmC@v1
How do I basecall data from legacy sequencing conditions?
Dorado supports basecalling for the DNA R10.4.1 and RNA004 conditions, but doesn't support basecalling for the legacy RNA002, R9.4.1, R10.3, and R10.4 conditions.
The DNA R9.4.1 and RNA002 conditions were deprecated as of Dorado v1.0.0.
Dorado v0.9.6 was the last release which supported DNA R9.4.1 and RNA002 conditions.
For R10.4 and R10.3, please use the legacy Guppy basecaller, which is available from the Nanopore Community Downloads page.
Where are the new models?
New Dorado releases incrementally support new models which are generally not backwards compatible with previous versions. If you can see a model in the Models List but you cannot download it please ensure you have the latest release of Dorado which you can find instruction on how to download and install it here
Outputs
Why do I have more records than reads?
Dorado reports the number of reads basecalled from your input data, but this number may differ from the number of records in your output because of read splitting. This can happen because a single read recorded to the POD5 file contains more than one molecule and this was not detected and split into separate records during sequencing by MinKNOW.
Dorado annotates split reads by adding the parent_read_id
which is stored in the bam pi
tag.
The parent_read_id
is the read id of the original unsplit read. Only reads which are children of
unsplit reads have this pi
tag.
We can count all bam records which are split reads using this command:
Note
Unsplit reads may contain an arbitrary number of reads not just 2 as shown in this example.
Here, we also include [dx]!=1
for completeness in case the data was duplex basecalled.
For more information on the dx
tag please see the duplex documentation.
If your output had 1 additional record, the above command would report 2 as an unsplit read
will be written as 2 new records with unique read_id
s and sharing the same parent_read_id
.