Poly(A) Estimation
Dorado has initial support for estimating poly(A) tail lengths for cDNA (PCS and PCB kits) and RNA, and can be configured for use with custom primer sequences, interrupted tails, and plasmids.
Poly(A) and Poly(T)
Oxford Nanopore cDNA reads are sequenced in two different orientations and Dorado poly(A) tail length estimation handles both (A and T homopolymers).
This feature can be enabled by setting --estimate-poly-a
argument which is disabled by default.
The estimated tail length is stored in the pt:i
tag of the output record.
Reads for which the tail length could not be estimated will have a value of -1 for the pt:i
tag
if the primer anchor for the tail was not found, or a value of 0 if the primer anchor was found, but
the length could not be estimated.
Dorado does not edit the original basecalled sequence using the results of the poly(A/T) estimate.
Custom poly(A) tail configuration
The default settings for this feature are optimized for non-interrupted poly(A/T)
sequences that occur at read ends but these setting can be configured using a configuration file
which is passed into Dorado using the --poly-a-config
argument.
This configuration file can configure parameters for:
- Custom primer sequence for cDNA tail estimation
- Clustering of interrupted poly(A/T) tails
- Estimation of poly(A/T) length in plasmids
Poly(A/T) reference diagram
5' --- ADAPTER --- FRONT_PRIMER
... --- cDNA
... --- poly(A) --- RC(REAR_PRIMER) --- 3'
OR
5' --- ADAPTER --- REAR_PRIMER --- poly(T)
... --- RC(cDNA)
... --- RC(FRONT_PRIMER) --- 3'
5' --- ADAPTER
... --- DNA
... --- FRONT_FLANK --- poly(A) --- REAR_FLANK
... --- DNA --- 3'
OR
5' --- ADAPTER
... --- RC(DNA)
... --- RC(REAR_FLANK) --- poly(T) --- RC(FRONT_FLANK)
... --- RC(DNA) --- 3'
Configuration format
The poly(A) configuration file uses the toml
format.
The content of the file depends on the application i.e. cDNA or plasmids.
Overrides
Configuration options can be overridden for individual barcodes. We generate a default
configuration as normal, and then add overrides of specific values for each barcode by
adding an [[overrides]]
section labelled by the barcode name.
[anchors]
front_primer = "ATCG"
rear_primer = "CGTA"
[threshold]
flank_threshold = 0.6
[tail]
tail_interrupt_length = 5
[[overrides]]
barcode_id = "Custom-Kit_barcode01"
[overrides.threshold]
flank_threshold = 0.5 # overrides 0.6
[[overrides]]
barcode_id = "Custom-Kit_barcode02"
[overrides.anchors]
front_primer = "AACC" # overrides ATCG
rear_primer = "GGTT" # overrides CGTA
[overrides.tail]
tail_interrupt_length = 10 # overrides 5
This creates three configurations:
- a default configuration with custom front and rear primers and an interrupt length of 5
- a configuration to use for
barcode01
from kitCustom-Kit
almost identical to the main custom settings (i.e. with the custom front and rear primers and the interrupt length), with an additional change to theflank_threshold
. - a configuration to use for
barcode02
from kitCustom-Kit
with different primers and an interrupt length of 10, but with no change to the flank threshold.
Configuration options
Config Group | Option | Description |
---|---|---|
anchors | front_primer | Front primer sequence for cDNA[1] |
anchors | rear_primer | Rear primer sequence for cDNA[1] |
anchors | plasmid_front_flank | Front flanking sequence of poly(A) in plasmid[2] |
anchors | plasmid_rear_flank | Rear flanking sequence of poly(A) in plasmid[2] |
anchors | primer_window | Window of bases at the front and rear of the rear within which to look for primer sequences |
threshold | flank_threshold | Threshold to use for detection of the flank/primer sequences. Equates to (1 - edit distance / flank_sequence) |
tail | tail_interrupt_length | Combine tails that are within this distance of each other (default is 0, i.e. don't combine any) |
- For cDNA only - Values ignored if either
plasmid_front_flank
orplasmid_rear_flank
are set. - For plasmids only