Community matters. Whether in a gut or in a soil sample, the composition of the microbes present dictate the health (or dysbiosis) of the ecosystem. However, determining an accurate, quantitative measurement of species abundance in complex sample types can be difficult.
One of the most widespread methods for identifying bacterial species is sequencing the 16S gene. However, while next generation sequencing (NGS) using short reads only can be used to identify species present at high abundance in a complex sample, this approach can miss species present in low abundance. Furthermore, short read 16S can over-represent certain species due to PCR amplification bias during library preparation. As a result, any traditional implementation of NGS that relies exclusively on short read length may not provide reliable, reproducible measurements of relative species abundance.
To address these issues, our team has developed the LoopSeq Microbiome, a sample prep kit that delivers a precise quantitation of relative microbial abundance in a complex sample.
During traditional NGS library preparation, certain sequences can be amplified more often than others, distorting estimates of their relative abundance. “Duplicate counting” is when the same 16S molecule is amplified during PCR and creates multiple identical clusters on the sequencer. Without a way to avoid “duplicate counting” of identical short reads, there is no way to tell whether identical short reads represent several independent samplings of the same species or an over-amplification of a single 16S molecule, confounding quantitative microbiome profiling.
As a solution to the inherent limitation of traditional PCR-based NGS for microbial quantitation, LoopSeq Microbiome “counts” individual 16S molecules before amplification occurs. This counting uses a unique molecular index (UMI) that we refer to as a barcode. Each full length 16S molecule, prior to PCR amplification, is tagged with a unique barcode. When a tagged molecule is prepared for sequencing, the barcode it contains is retained throughout the prep. Afterwards, all the short reads bearing the same barcode must have come from the same parent molecule and can be grouped into a single “cloud” of reads. As such, each short read can be matched back to an original 16S molecule in the sample, thereby correcting for any amplification bias that may have occurred during library preparation. The resulting data enables a highly accurate measurement of relative species abundance (Figure 1).
To validate the robustness by which LoopSeq Microbiome accurately reports relative species abundance in a microbial community, eight aliquots of ZymoBIOMICS™ Microbial Community Standard were prepared separately using the LoopSeq Microbiome kit, sequenced, and analyzed on the Loop Genomics cloud-based data processing pipeline. Comparison of the composition of the microbial community standard with the theoretical composition shows strong agreement between measured and expected values (Figure 2).
For additional validation using a more complex sample, we used LoopSeq Microbiome to measure relative species abundance in an ATCC 20 Strain Staggered Mix standard (Cat.# MSA-1003). This comparison of the ten most abundant microbes in the sample show good agreement of measured abundance to the expected abundance (Figure 3).
In another validation, we took the long-read data generated and reduced them to isolated short-reads to simulate a traditional NGS approach. We used this modified data set to estimate species abundance based only on the frequency of short-reads, and then compared those estimates with the corresponding LoopSeq Microbiome measured long-read data (Figure 4).
As shown in Fig 4., for highly abundant species (5%-50% of 16S molecules in the population), the short-read and long-read estimates of relative abundance were in agreement (no significant fold-difference). However, for species in low abundance (<0.05% of 16S molecules in the population), which typically account for 60%-80% of unique species (e.g. 404 out of 525 identified species) in complex samples, the fold-difference in the error of abundance estimation between short-read quantification and long-read LoopSeq molecule counting ranged from 2-fold to 20-fold, demonstrating the poor ability of short-read NGS alone to faithfully quantify low abundance species.
Studies are only as meaningful as the data used to inform them is reliable, and to generate reliable data requires a method that is technically robust and expertly designed. To give the most accurate microbial population measurements by 16S rDNA sequencing, LoopSeq Microbiome provides a streamlined, tested, and cost-effective approach that will take you from concept to breakthrough in a few simple steps.
To unlock your research potential with LoopSeq Microbiome, visit our shop for more information.