Q-scores

Phred Quality Score, often shortened to Q-score, is a metric describing the probability of an incorrect base call in a sequence. The higher the Q-score, the better.

Q-score Q is defined below where P is the basecalling error probability:

Q = -10 * log10(P)

Conversely:

P = 10 ^ (-Q/10)

For example, if the basecalling error probability P is 1/1000 := 10^(-3) then we get Q30.

Q = -10 * log10( 1/1000 )
  = -10 * log10( 10^(-3) )
  = -10 * (-3)
  = 30

Alternatively, if the basecalling accuracy A is 99% we get Q20 as A = 1 - P:

Q = -10 * log10 ( 1 - (99/100) )
  = -10 * log10 ( 1 - 0.99 )
  = -10 * log10 ( 0.01 )
  = -10 * log10 ( 10^(-2) )
  = -10 * (-2)
  = 20

Q-string

Per-base Q-scores for a given sequence are encoded as ASCII characters which forms a string. This is known as the Q-string. The encoding spans ! to ~ representing 0 to 93 (inclusive) respectively.

In practice, Dorado reports a maximum Q-score of 50 which is S (upper case).

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
| ------------- Dorado Range [0, 50] ------------ |

To calculate the the Q-score from a Q-string character subtract 33 from the ASCII decimal value (ASCII(33) = !).

Tip

In Python - use the ord built-in to convert a ASCII character to its decimal value.

assert ord("S") == 50

Mean Q-score calculation in Dorado

Given a Q-string which is generated by Dorado during basecalling and represents Dorado's confidence in the basecall, the mean Q-score calculation is performed as follows:

1.  Trim the leading 60 bases if the sequence is longer than 60 bases.
2.  Convert the Q-scores into error probabilities
3.  Calculate the mean of the error probabilities
4.  Convert the mean error probability into mean Q-score

The leading 60 bases are trimmed to account for the higher than normal noise at the beginning of sequencing.

Below is an example Python snippet calculating the mean Q-score from a Q-string.

import math

def mean_qscore(qstring: str) -> float:
    """Calculates mean Q-score from a Q-string"""
    # Truncate string if sufficiently long
    qstr = qstring[60:] if len(qstring) > 60 else qstring
    # Convert ASCII to Phred quality scores, then to estimated error
    errors = [10**(-(ord(char) - 33)/10) for char in qstr]
    # Calculate mean error
    mean_error = sum(errors) / len(errors)
    # Convert back to q-score
    mean_qscore = -10 * math.log10(mean_error)
    return mean_qscore

# Example quality string
qstring = ";;;;;?<<<?GJI>>>>>>=<=<<=>>A@A??@==<8889?>=<<<;<<;;?==<;::;;<SS:::7666444<2+')933;<<=???CA?>=>=>>??51)))+&&(('&&&&''))HELLO;READER;'()))))&%%$"
print(mean_qscore(qstring))
# 10.380942270316051