Skip to content

Model Selection Complex

Definition of Complex

Consisting of many different and connected parts.

The model argument in Dorado tools can specify either a simplex model path or a model complex.

Using a model complex instructs Dorado to automatically select all basecalling models based on the model complex given and the data to be basecalled. This includes the simplex models, modified bases models and stereo duplex models.

Model complex syntax

A model complex must start with the simplex model speed, and follows this syntax:

speed[version][,mod[version]]*
  • [] - Square brackets enclose an optional field.
  • * - The asterisk / star shows that a field may be repeated zero or more times.
  • , - All items must be comma-separated.

Fields

speed

The model speed can be any of fast, hac or sup.

version

The version takes the form of @vX.Y.Z or @latest.

X, Y and Z here are major, minor, and patch version numbers (e.g. @v1.2.3).

Any missing values are assumed to be zero e.g. @v1.2 -> @v1.2.0.

If @latest is used, the latest available model version is used.

mod

Multiple Modification Models

More than one modification model may be selected at once and each must be separated by a comma.

For example: sup,6mA,5mC@latest

The mod field can be any modification name which is available for the simplex model and can be optionally followed by a version.

Examples: 6mA, m6A, pseU, 5mC@v2 and 5mCG_5hmCG@v1.0.0.

Automatically selected modification models will always match the base simplex model version and will be the latest compatible version unless a specific version is set by the user.

Multiple modification models must use different canonical bases

When selecting multiple modification models, only one modification per canonical base may be active at once.

For example, sup,4mC,5mC is invalid as both modification models operate on the C canonical base context.

This is because the modification probabilities reported could be nonsensical as each model could report high confidence of two different modifications at the same position.

See the Model List for a list of all available models.

Examples of model complexes

Model Complex Description
fast Latest compatible fast model
hac Latest compatible hac model
sup Latest compatible sup model
hac@latest Latest compatible hac simplex basecalling model
hac@v4.2.0 Simplex basecalling hac model with version v4.2.0
hac@v3.5 Simplex basecalling hac model with version v3.5.0
hac,5mCG_5hmCG Latest compatible hac simplex model and latest 5mCG_5hmCG modifications model for the chosen basecall model
hac,5mCG_5hmCG@v3 Latest compatible hac simplex model and 5mCG_5hmCG modifications model with version v3.0.0
sup,5mCG_5hmCG,6mA Latest compatible sup model and latest compatible 5mCG_5hmCG and 6mA modifications models

Here are some examples of model complexes in use:

# Simplex basecalling
dorado basecaller hac                   reads/ > calls.bam # HAC simplex basecalling
dorado basecaller hac@v4.1.0            reads/ > calls.bam # HAC simplex with specific version

# Simplex modification basecalling
dorado basecaller sup,6mA               reads/ > calls.bam # SUP with modifications
dorado basecaller sup,6mA,5mCG_5hmCG    reads/ > calls.bam # Multiple modification models
dorado basecaller sup@v4.2.0,6mA@v1     reads/ > calls.bam # Setting versions

# Duplex basecalling
dorado duplex     sup@v4.1.0  reads/ > calls.bam # SUP duplex basecalling with specific version
dorado duplex     sup,5mC     reads/ > calls.bam # SUP duplex basecalling with modification model