NGS basics: Quality metrics for NGS library preparation
The aim of every NGS library prep workflow is to convert input molecules of DNA or cDNA by the ligation of vendor-specific adapters into a format that can be sequenced with an NGS platform. The concept does appear very straightforward, but achieving this goal is more complex than you can imagine, and only rigorous quality control during and after the experiment can help ensure optimal outcomes.
How can you evaluate the quality of an NGS library?
Two determinants of successful sequencing are the quality and quantity of the library material. The fragment size distribution and the accurate library concentration assessment by qPCR approach are two critical parameters that can help you evaluate the quality of your library prior to sequencing. Besides these two standard quality metrics, the following additional important metrics enable you to judge the overall quality of your library prep.
The conversion rate describes how many input molecules were converted into “sequenceable” fragments, or fragments that have adapters attached to both ends. One simple method to calculate conversion rate is to compare the measured specific library yield divided by theoretical maximum yield. Certain assumptions regarding library amplification PCR efficiency and DNA loss due to cleanup steps have to be taken into consideration for such calculation. Furthermore, it has to be taken into consideration that qPCR library quantification method does not discriminate specific libraries and adapter dimers and is not the suitable quantification method for libraries with high adapter dimer contamination. For such libraries, we recommend quantification with an electrophoresis-based method such as QIAxcel, Agilent Bioanalyzer or Agilent TapeStation.
For example, by assuming 95% PCR efficiency and 20% sample loss in cleanup steps, the theoretical maximum yield can be calculated as:
The conversion rate is then defined as the ratio between the measured yield and the theoretical maximum yield:
The conversion rate is largely influenced by the ligation efficiency. A good library prep chemistry will have enzyme and buffer formulations optimized in a way so as to ensure maximum ligation efficiency.
High conversion rate and low bias in library prep and PCR amplification will capture more unique molecules from your sample. The more unique molecules that are sequenced, the less duplicated reads will be present in your data set. Duplicate reads do not add meaningful information to the NGS data set and can lead to skewed variant frequencies. As a result, duplicate reads are removed during data analysis.
Another important factor determining the complexity of an NGS library is the amount of sample used for library prep. The lower the sample input, the less complex an NGS library becomes. Thus, a high ligation efficiency and conversion rate is especially important for sub-nanogram input amounts to capture the maximum possible complexity of a limited sample.
The coverage uniformity describes how even reads are distributed along the genome or in a set of target regions. The more uniform the coverage, the less sequencing is required to reach sufficient depth from all regions of interest. Bias on the coverage uniformity is usually introduced in the library prep and library amplification stages. Very often, the coverage uniformity shows a strong GC bias – i.e., less or more coverage, depending on the GC content. Exploring the evenness of coverage as a dependent factor on the GC content helps uncover GC bias. A good library prep will show little effect on the coverage uniformity, depending on the GC content.
The higher the accuracy of your NGS library prep, the more you can trust your variant reporting. Nucleotide errors can be typically introduced through library amplification PCR as well as during the sequencing, itself. Sequencing errors tend to typically remain below 1%. Library amplification PCR errors can be minimized through high-fidelity PCR reagents. NGS reference samples with well-defined variants and respective frequencies help assess the accuracy of your NGS workflow.
During our library prep product development cycle, these key quality indicators are taken into account and used to optimize our QIAseq library prep products to obtain better end results. If you’d like to learn more about QIAGEN’s QIAseq NGS Portfolio, click here.
Interested in learning more about NGS library construction technology? Sign up for a free 3-part webinar series on NGS here!