NGS data can be extremely complex, and technology that helps improve the accuracy of data analysis can make the difference between drawing the right conclusions or the wrong ones. Sample and molecular barcodes serve roughly the same purpose as barcodes in any other context – they’re unique identifiers that distinguish and group like items together. But unless you’ve used both types before, you might not be entirely clear on how they’re different, or when each one is used.
Sample barcodes – identifying different samples for multiplex sequencing
First, let’s talk about sample barcodes. If you have a smaller genomic region that you want to analyze, such as when using a targeted DNA or RNA panel, you can boost your efficiency by grouping your samples together in a single sequencing run – ie, multiplexing your NGS experiment. To ensure that you can distinguish the reads that belong to each sample, you ligate a short DNA sequence, a sample-specific “barcode”, to the fragments from each sample during library preparation. All of the fragments belonging to a certain sample will share the same barcode. Once you’ve finished your multiplex run, you can then “de-multiplex” during data analysis by sorting the reads according to their sample barcodes and analyzing separately.
So why do you need molecular barcodes?
Sample barcodes and molecular barcodes aren’t mutually exclusive, and both can be necessary for an NGS experiment. This is because molecular barcodes, or Unique Molecular Indices (UMI), have nothing to do with multiplexing samples. Instead, their purpose is to correct a previously tricky problem in the use of PCR during NGS workflows, such as for targeted enrichment of specific regions prior to sequencing, or for library amplification to obtain sufficient material for quantification. PCR amplification inherently leads to errors such as amplification bias, and without a control, the reads ratio can provide skewed data, leading to incorrect conclusions. Here’s an illustration of the problem:
In contrast to sample barcoding, molecular barcoding assigns a unique sequence not just to all the molecules from a certain sample, but to all molecules being amplified and sequenced. So in the above diagram, for example, Sample A contains 4 copies of the gene of interest, while Sample B contains only 1, so there’s a 4:1 ratio between the two samples. PCR amplification errors lead to 12 copies of the fragment from Sample A and 6 from Sample B. This would lead you to believe, if you were just looking at the raw reads, that the genes were present at a 2:1 ratio between the samples. The molecular barcodes are quantified rather than the raw reads, however, ensuring that the true 4:1 ratio remains intact despite the erroneous duplicates introduced in the PCR step.
The other advantage of molecular barcodes is that PCR artifacts can sometimes be misread as sequence variants. This makes identifying low-frequency variants difficult, because you can’t be sure whether the difference you’re looking at is a true variant or not. With molecular barcodes, if the majority of the reads for a particular molecule don’t contain the “variant”, you can be more confident that it’s just an artifact; conversely, if all of the reads do contain it, even if it’s present at a low frequency, you can be confident that it’s real.
For these reasons, molecular barcoding / UMI is essential to accuracy in any NGS experiment involving PCR. The QIAseq Targeted DNA and RNAscan Panels both use molecularly barcoded adapters and gene-specific primers to to tag individual DNA or cDNA molecules prior to any PCR amplification, while the QIAseq Targeted RNA Panels have the molecular barcode on the gene-specific primer, ensuring that every molecule is correctly quantified. Learn more about the system – download the flyer!