Read Trimming
Dorado can trim adapters and/or primer sequences from the beginning and end of DNA and RNA reads during basecalling
For DNA basecalls only, trimming can be done as a separate step after basecalling
using the Dorado trim subcommand.
RNA trimming
RNA trimming is always done in-line with basecalling and cannot be done afterwards
using Dorado trim.
Demultiplexing trimmed data
Trimming adapters and primers may result in parts of the barcode flanking regions being removed, which could interfere with demultiplexing.
Trimming while basecalling
Dorado basecaller will attempt to detect any adapter or primer sequences at
the beginning and end of reads, and remove them from the output sequence.
This functionality can be controlled using either the --trim or --no-trim options
with Dorado basecaller.
The --trim option takes as its argument one of the following values:
| Option | Adapters | Primers | Barcodes | Description |
|---|---|---|---|---|
all |
Detected adapters or primers will be trimmed. If barcoding is enabled, detected barcodes will be trimmed. This is the default option |
|||
primers |
Detected adapters or primers will be trimmed. If barcoding is enabled, detected barcodes will not be trimmed. |
|||
adapters |
Detected adapters will be being trimmed, but primers will not be trimmed. If barcoding is enabled, detected barcodes will not be trimmed. |
|||
none |
Nothing will be trimmed. Equivalent to --no-trim |
Trimming existing datasets
The Dorado trim subcommand can be used to trim adapters and/or primer sequences in
existing basecalled datasets. To do this, run:
<calls> can either be an HTS format file (e.g. FASTQ, BAM, etc.) or a stream of an
HTS format (e.g. the output of Dorado basecalling).
The --no-trim-primers option can be used to prevent the trimming of primer sequences.
In this case only adapter sequences will be trimmed.
If it is also your intention to demultiplex the data, then it is recommended that you demultiplex before trimming any adapters and primers, as trimming adapters and primers first may interfere with correct barcode classification.
The output of Dorado trim will always be unaligned records, regardless of whether the
input is aligned/sorted or not.
CLI reference
Positional arguments:
reads Path to a file with reads to trim. Can be in any HTS format. [nargs: 0 or more]
Optional arguments:
-h, --help shows help message and exits
-v, --verbose [may be repeated]
-t, --threads Combined number of threads for adapter/primer detection and output generation.
Default uses all available threads.
Input arguments:
-n, --max-reads Maximum number of reads to process.
-l, --read-ids A file with a newline-delimited list of reads to trim.
Output arguments:
--emit-fastq Output in fastq format. Default is BAM.
Main arguments:
--no-trim-primers Skip primer detection and trimming. Only adapters will be detected and trimmed.
--primer-sequences Path to file with custom primer sequences.
Custom primer trimming
Note
Using the --primer-sequences argument will remove the Oxford Nanopore primer sequences from the
trimming search.
Dorado searches for primer sequences used in Oxford Nanopore kits. However, you can specify
an alternative set of primer sequences to search for when trimming either in-line with basecalling,
or when using Dorado trim directly.
In both cases this is accomplished using the --primer-sequences argument, followed by the
path to a FASTA file containing the primer sequences you want
to search for. The record names of the sequences do not matter.
Effect on demultiplexing
If adapter/primer trimming is done while basecalling in combination with demultiplexing, then Dorado will ensure that the trimming of adapters and primers does not interfere with the demultiplexing process.
For example, trimming will not effect demultiplexing on kit-name in the following command:
However, if you intend to do demultiplexing as a separate step, it is recommended that
trimming is disabled when basecalling with the --no-trim option, to ensure that barcode sequences
remain intact in the calls.