Alignment

Dorado uses the minimap2 aligner to align basecalled sequences to a reference and supports aligning existing basecalls or producing aligned output directly.

Aligning existing basecalls

To align existing basecalls, run:

dorado aligner <index> <calls> > aligned.bam

where index is a reference to align to in (FASTQ/FASTA/.mmi) format and reads is a folder or file in any HTS format.

Writing to an output directory

When reading from an input folder, Dorado aligner also supports writing aligned files to an output directory. The output directory is formatted in the MinKNOW output structure.

dorado aligner <index> <calls-dir> --output-dir <output-dir>

Alignment summary

An alignment summary containing alignment statistics for each read can be generated with the --emit-summary argument.

Note

The --emit-summary argument requires that the --output-dir <output-dir> argument is set.

The alignment summary file will be written into <output-dir>.

Alignment during basecalling

Including alignment during basecalling should not have a significant impact on overall basecalling throughput. Although alignment is a CPU intensive operation, basecalling throughput is generally limited by the GPU while the CPU is under-utilised. Dorado can make efficient use of both the GPU for basecalling and the otherwise under-utilised CPU for alignment, performing both concurrently.

To basecall with alignment with Dorado basecaller or Dorado duplex, add the --reference argument:

dorado basecaller <model> <reads> --reference <index> > aligned.bam
dorado duplex     <model> <reads> --reference <index> > aligned.bam

Minimap2 options

Alignment uses minimap2 and by default uses the lr:hq preset. This can be overridden by passing a minimap option string, --mm2-opts, using the '-x ' option and/or individual options such as -k and -w to set kmer and window size respectively.

dorado aligner <index> <calls> --output-dir <output-dir> --mm2-opts "-x splice --junc-bed <annotations_file>"
dorado aligner <index> <calls> --output-dir <output-dir> --mm2-opts --help

dorado basecaller <model> <reads> --reference <index> --mm2-opts "-k 15 -w 10" > aligned.bam

For a complete list of supported minimap2 options use '--mm2-opts "--help"'. For example:

$ dorado aligner <index> <calls> --mm2-opts "--help"

Optional arguments:
  -h, --help   shows help message and exits
  -k           minimap2 k-mer size for alignment (maximum 28).
  -w           minimap2 minimizer window size for alignment.
  -I           minimap2 index batch size.
  --secondary  minimap2 outputs secondary alignments
  -N           minimap2 retains at most INT secondary alignments
  -Y           minimap2 uses soft clipping for supplementary alignments
  -r           minimap2 chaining/alignment bandwidth and optionally long-join bandwidth specified as NUM,[NUM]
  --junc-bed   Optional file with gene annotations in the BED12 format (aka 12-column BED), or intron positions in 5-column BED. With this option, minimap2 prefers splicing in annotations.
  -x           minimap2 preset for indexing and mapping. [default: "lr:hq"]

Warning

Not all arguments from minimap2 are currently available and parameter names are not finalized and may change.

Note that dorado does support split indexes, however the entire index must be able to fit in memory. Aligning to a split index may result in some spurious secondary and/or supplementary alignments, and the mapping score may not be as reliable as for a non-split index. So it is recommended that, if possible, you generate your mmi index files using the -I option with a large enough value to generate a non-split index. Or, if you are directly using a large fasta reference, pass a large enough value of the -I minimap2 option using --mm2-opts to insure that the index is not split.

Counting overlaps

The --bed-file <bed> argument is available in the Dorado basecaller and Dorado aligner. This argument specifies a .bed filepath which is used to count the number of overlaps between the bed file regions and the alignments generated.

This number is written to the BAM file output as the bh read tag.

CLI reference

Here's a slightly re-formatted output from the Dorado aligner subcommand for reference.

Info

Please check the --help output of your own installation of Dorado as this page may be outdated and argument defaults have been omitted as they are platform specific.

> dorado aligner --help

Positional arguments:
  index                       reference in (fastq/fasta/mmi).
  reads                       An input file or the folder containing input file(s) (any HTS format).

Optional arguments:
  -h, --help                  shows help message and exits
  -v, --verbose               [may be repeated]

Input data arguments:
  -r, --recursive             If the 'reads' positional argument is a folder any subfolders will also
                                be searched for input files.
  -n, --max-reads             maximum number of reads to process (for debugging, 0=unlimited).

Alignment arguments:
  --mm2-opts                  Optional minimap2 options string. For multiple arguments surround with double quotes.
  --bed-file                  Optional bed-file. If specified, overlaps between the alignments and bed-file
                                entries will be counted, and recorded in BAM output using the 'bh' read tag.

Output arguments:
  -o, --output-dir            If specified output files will be written to the given folder, otherwise output
                                is to stdout. Required if the 'reads' positional argument is a folder.
  --no-sort                   Disable sorting of output files.
  --emit-sam                  Output in SAM format.
  --emit-summary              If specified, a summary file containing the details of the primary alignments
                                for each read will be emitted to the root of the output folder.
                                This option requires that the '--output-dir' option is also set.

Advanced arguments:
  -t, --threads               number of threads for alignment and BAM writing (0=unlimited).