Move Table
The move table is a record of the model's base emissions in strided signal space and
gives a coarse sequence-to-signal mapping. A 1 in the move table indicates the emission
of a base at the indexed position in the signal, while a 0 in the move table indicates
that section of signal is not associated with the emission of a base.
The move table length is equal to the number of strided signal blocks.
The move table is stored in the original signal direction starting after any
trimmed signal ts. Since the move table is always
in signal direction, it is not changed for alignments to reverse-complement references or
for 3'->5' reads which have their sequences reversed to the 5'->3' direction at write-time.
It can be added to SAM/BAM/CRAM outputs by setting the --emit-moves flag.
Move table metadata format
The format of the move table metadata SAM/BAM/CRAM tag is as follows:
block_stride: An int8_t containing the number of source signal samples which each
element in the signal_block_move_list corresponds to.
This will be set to the input stride of the model.
signal_block_move_list: A comma separated list of int8_t samples, each one containing a single move
table element (unless overflow has occurred, see implementation details below).
Each element corresponds to block_stride samples of the raw source signal.
The move table entries will be stored in order in successive int8_ts of the signal_block_move_list.
For example:
Implementation details
As the metadata is signed, each individual element supports values in the range -128 to 127. In order to be able to store values outside this range, if a single element in the metadata has the value -128 or 127, then the next entry in the metadata should be added to the current one, in order to reconstruct the original value.
For example:
Note that the exact value -128 or 127 (or multiples thereof) requires a trailing zero for the format to be encoded correctly.
For example:
Example
Given the above example move table: mv:B:c,5,0,0,1,0,1
The block stride is 5 (the first value) and the remaining values 0,0,1,0,1 state that
the emitted bases occurred in the 3rd and 5th strided blocks.
Converting strided blocks into signal space ([0-4,5-9,10-14,15-19,20-24]) we can state that these bases were emitted
from the 10th-14th and 20th-24th signal samples respectively.