Barcodes in NGS: sample vs molecular

QIAseq_Targeted_DNApanels_700x233

NGS data can be extremely complex, and technology that helps improve the accuracy of data analysis can make the difference between drawing the right conclusions or the wrong ones. Sample and molecular barcodes serve roughly the same purpose as barcodes in any other context – they’re unique identifiers that distinguish and group like items together. But unless you’ve used both types before, you might not be entirely clear on how they’re different, or when each one is used.

Sample barcodes – identifying different samples for multiplex sequencing

First, let’s talk about sample barcodes. If you have a smaller genomic region that you want to analyze, such as when using a targeted DNA or RNA panel, you can boost your efficiency by grouping your samples together in a single sequencing run – ie, multiplexing your NGS experiment. To ensure that you can distinguish the reads that belong to each sample, you ligate a short DNA sequence, a sample-specific “barcode”, to the fragments from each sample during library preparation. All of the fragments belonging to a certain sample will share the same barcode. Once you’ve finished your multiplex run, you can then “de-multiplex” during data analysis by sorting the reads according to their sample barcodes and analyzing separately.

So why do you need molecular barcodes? 

Sample barcodes and molecular barcodes aren’t mutually exclusive, and both can be necessary for an NGS experiment. This is because molecular barcodes, or Unique Molecular Indices (UMI), have nothing to do with multiplexing samples. Instead, their purpose is to correct a previously tricky problem in the use of PCR during NGS workflows, such as for targeted enrichment of specific regions prior to sequencing, or for library amplification to obtain sufficient material for quantification. PCR amplification inherently leads to errors such as amplification bias, and without a control, the reads ratio can provide skewed data, leading to incorrect conclusions. Here’s an illustration of the problem:

FC_0342_QIAseqTargetedRNA

In contrast to sample barcoding, molecular barcoding assigns a unique sequence not just to all the molecules from a certain sample, but to all molecules being amplified and sequenced. So in the above diagram, for example, Sample A contains 4 copies of the gene of interest, while Sample B contains only 1, so there’s a 4:1 ratio between the two samples. PCR amplification errors lead to 12 copies of the fragment from Sample A and 6 from Sample B. This would lead you to believe, if you were just looking at the raw reads, that the genes were present at a 2:1 ratio between the samples. The molecular barcodes are quantified rather than the raw reads, however, ensuring that the true 4:1 ratio remains intact despite the erroneous duplicates introduced in the PCR step.

The other advantage of molecular barcodes is that PCR artifacts can sometimes be misread as sequence variants. This makes identifying low-frequency variants difficult, because you can’t be sure whether the difference you’re looking at is a true variant or not. With molecular barcodes, if the majority of the reads for a particular molecule don’t contain the “variant”, you can be more confident that it’s just an artifact; conversely, if all of the reads do contain it, even if it’s present at a low frequency, you can be confident that it’s real.

For these reasons, molecular barcoding / UMI is essential to accuracy in any NGS experiment involving PCR. The QIAseq Targeted DNA and RNAscan Panels both use molecularly barcoded adapters and gene-specific primers to to tag individual DNA or cDNA molecules prior to any PCR amplification, while the QIAseq Targeted RNA Panels have the molecular barcode on the gene-specific primer, ensuring that every molecule is correctly quantified. Learn more about the system – download the flyer!

Ali Bierly

Ali Bierly, PhD is a Global Market Manager in Translational Sciences at QIAGEN, and has written on a number of scientific topics in the biotech industry as the author of QIAGEN's Reviews Online. She received her PhD from Cornell University in 2009, studying the immune response to a protozoan parasite, Toxoplasma gondii. Ali has a keen interest in the emerging importance of microRNA and other circulating nucleic acids as biomarkers for disease.

zubia

Hi, I am currently working on metagenomics analysis of chicken gut by using dietary modulation. I would like to know if i have five study group samples, then sample-specific barcode will be unique to each sample or to the bacterial communities I want to target.

Reply
Christine Davis

Hi Zubia,

Thanks for commenting! Although we don’t have any targeted panels available for chicken species, I can answer your question regarding barcoding.
When adding a sample index, the sample-specific barcode will be unique to each sample in your run. With targeted panels, you have the option of multiplexing up to 384 samples per sequencing run, depending on your instrument.

With the unique molecular indices, which correct for PCR errors downstream, it is a 12-base random barcode. This provides about 16.7 million index possibilities, so there is no worry of saturating out all possible indices per run.

For metagenomics, please check out the QIAseq 1-Step Amplicon Kit for 16S rRNA community profiling, or the QIAseq FX DNA Library Kit for whole genome sequencing.

Let us know if you have any additional questions!

Best,
Christine

Reply

Your email address will not be published. Required fields are marked *