Custom Adapter and Primer Sequences
Dorado will automatically detect and trim any adapter or primer sequences it finds.
The specific sequences it searches for depends on the specified sequencing kit.
Dorado basecaller, can get this information from read metadata in the input pod5.
Dorado trim however, requires that the sequencing kit is specified using the command-line option.
In some cases, it may be necessary to find and remove adapter and/or primer sequences that would
not normally be associated with the sequencing kit that was used, or you may be working with
older data for which the sequencing kit and/or primers being used are no longer directly
supported by Dorado (for example, anything prior to kit14). In such cases, you can specify a
custom adapter/primer file, using the command-line option --primer-sequences. If this option
is used, then the sequences encoded in the specified --primer-sequences file
will be used instead of the default sequences.
Custom adapter/primer file format
The custom adapter/primer file uses the FASTA file format, where the desired adapter/primer sequences are specified with additional metadata to define how each sequence should be used.
The following is an example adapter sequence:
The syntax rules are as follows:
- Record Name
-
The record name must be of the form
[id]_frontor[id]_rear.The
idmust be unique other than for the_frontand_rearpair. - Type
-
The HTS-style tag
et:Z:, with a value of eitheradapterorprimer. - Kits
-
The HTS-style tag
sk:Z:, with a value of eitherany, or a list of sequencing kits (e.g.,[kit1],[kit2],[kit3]).
Note that the HTS-style tags should be tab-delimited.
How Dorado searches for adapters/primers
The _front and _rear record name suffixes and the type designator defines how Dorado will
search for the sequence.
- For adapters:
-
Dorado will search for
frontsequence near the beginning of the read, and forrearsequence near the end of the read. - For primers:
-
Dorado will search for the
frontsequence near the beginning of the read, and the reverse-complement of therearsequence near the end of the read. Dorado will also search for therearsequence near the beginning of the read, and for the reverse-complement of thefrontsequence near the end of the read.
The et:Z: tag is required to designate whether the sequence is an adapter or a primer sequence, so that dorado knows
how it should be used. The sk:Z: tag is required to indicate which sequencing kit the adapter or primer sequence may
be used with. The sequence will only be searched for if the sequencing-kit information in the read matches one of the kit
names in the custom file. If the sk:Z: tag has the value any, then the sequence will be searched for in all reads,
regardless of the kit that was used. Note that the kit names are case-insensitive.
Example custom adapter/primer file
The following could be used to detect the PCR_PSK_rev1 and PCR_PSK_rev2 primers, along with
the LSK109 adapters, for older data.
>LSK109_front et:Z:adapter sk:Z:any
AATGTACTTCGTTCAGTTACGTATTGCT
>LSK109_rear et:Z:adapter sk:any
AGCAATACGTAACTGAACGAAGT
>PCR_PSK_front et:Z:primer sk:any
ACTTGCCTGTCGCTCTATCTTCGGCGTCTGCTTGGGTGTTTAACC
>PCR_PSK_rear et:Z:primer sk:any
AGGTTAAACACCCAAGCAGACGCCGCAATATCAGCACCAACAGAAA
In this case, the above adapters and primers would be searched for in all reads, regardless of the sequencing-kit information encoded in the read file, or in the case of dorado trim, regardless of the sequencing-kit specified on the command-line.
To restrict the search to only primers in reads where SQK-PSK004 specified as the kit name,
and adapters if reads were from either SQK-PSK004 or SQK-LSK109, then the following could be used.
>LSK109_front et:Z:adapter sk:Z:SQK-PSK004,SQK-LSK109
AATGTACTTCGTTCAGTTACGTATTGCT
>LSK109_rear et:Z:adapter sk:Z:SQK-PSK004,SQK-LSK109
AGCAATACGTAACTGAACGAAGT
>PCR_PSK_front et:Z:primer sk:Z:SQK-PSK004
ACTTGCCTGTCGCTCTATCTTCGGCGTCTGCTTGGGTGTTTAACC
>PCR_PSK_rear et:Z:primer sk:Z:SQK-PSK004
AGGTTAAACACCCAAGCAGACGCCGCAATATCAGCACCAACAGAAA