Long Read Genomics Blog
Long Read Viral Genomics
If our recent history behind a mask has taught us anything it is that novel viruses will always be with us, like it or not. Our best defense is science-based knowledge. Enter viral genome sequencing. Traditional short read sequencing methods when applied to the study of viruses can be a powerful tool for the development of vaccines and the design of treatment regiments.
Amplicon Sequence Variant (ASV) and Clonotype analysis
Long-read sequencing (LRS) provides unique insights into genetic diversity often overlooked by short read technologies. Imagine a million DNA sequences retrieved from a high-throughput sequencing analysis. Now take one sequence and cluster it with the other sequences in order to find other sequences that match it 100% along the entire length of the long read.
Long Read Viral Genomics
If our recent history behind a mask has taught us anything it is that novel viruses will always be with us, like it or not. Our best defense is science-based knowledge. Enter viral genome sequencing. Traditional short read sequencing methods when applied to the study of viruses can be a powerful tool for the development of vaccines and the design of treatment regiments. However, because short read sequencing breaks a genome into small sections and then attempts to align those short reads to map an overall genome, it has limitations in correctly assembling the short reads into genome haplotypes that represent the real viral haplotypes. Specifically, short reads sequencing is limited in its ability to associate mutations that are more than a short read apart in the genome and struggles with assembling highly similar or repetitive elements. The complete sequence of a single viral genome, or a haplotype, as well as its often complex transcriptional pattern are determined by a myriad of features such as promoters, transcription start sites (TSS), transcription Termination sites (TTS), transcript editing, random mutations and RNA splicing isoforms, to name a few.
The limitations of short read sequencing make it challenging to analyze the function of viruses using a holistic approach that takes all of these features into consideration due to the fact that it is problematic to nearly impossible to tell from short reads alone which features exist on the same genome. Instead, we are left with collections of short reads that cannot be reliably assembled into the actual full length viral genome haplotypes.
Long read sequencing (LRS) comes to the rescue, providing advantages over SRS methods. Long read sequencing solutions obviate many of the issues associated with viral genome and transcriptome sequencing, creating new opportunities for more detailed studies. Thus, Long read sequencing methods have been used with great success in the study of herpesvirus (Tombácz et al, 2019), retroviruses (Moldován et al, 2018) and circoviruses (Moldován et al, 2017). Long read sequencing of viral genomes reveals new information on novel splicing sites (Moldován et al, 2018), transcriptional overlaps (Moldován et al, 2017) and RNA editing (Prazsák et al, 2018), all of which are important for viruses that rely heavily on alternative gene splicing and are characterized by overlapping genes. LRS-derived data has helped reveal hitherto unknown complexities of these viruses, especially their alternatively spliced isoforms, which short read sequencing methods cannot identify.
So, when it comes to identifying and characterizing the next pandemic virus, longer is better when it comes to genome sequencing. Long read sequencing enhances our understanding of the evolutionary relationship between viruses, their functions, and how they impact human health and medicine.
Loop Publications Links:
Prazsák, I., Moldován, N., Balázs, Z., Tombácz, D., Megyeri, K., Szűcs, A., ... & Boldogkői, Z. (2018). Long-read sequencing uncovers a complex transcriptome topology in varicella zoster virus. BMC genomics, 19(1), 1-20.
Tombácz, D., Moldován, N., Balázs, Z., Gulyás, G., Csabai, Z., Boldogkői, M., ... & Boldogkői, Z. (2019). Multiple long-read sequencing survey of herpes simplex virus dynamic transcriptome. Frontiers in genetics, 10, 834.
Moldován, N., Szűcs, A., Tombácz, D., Balázs, Z., Csabai, Z., Snyder, M., & Boldogkői, Z. (2018). Multiplatform next-generation sequencing identifies novel RNA molecules and transcript isoforms of the endogenous retrovirus isolated from cultured cells. FEMS microbiology letters, 365(5), fny013.
Moldován, N., Balázs, Z., Tombácz, D., Csabai, Z., Szűcs, A., Snyder, M., & Boldogkői, Z. (2017). Multi-platform analysis reveals a complex transcriptome architecture of a circovirus. Virus research, 237, 37-46.
Amplicon Sequence Variant (ASV) and Clonotype Analysis
Just when you think you don’t need another way to cluster sequence variants, you find out that you do, and that it’s better than what you’ve been doing. Next-generation sequencing (NGS) technologies have revolutionized how we analyzes DNA, enabling the study of complex genetically encoded systems such as microbiomes and immune repertoires.
Of these technologies, long-read sequencing (LRS) provides unique insights into genetic diversity often overlooked by short read technologies. Imagine a million DNA sequences retrieved from a high-throughput sequencing analysis. Now take one sequence and cluster it with the other sequences in order to find other sequences that match it 100% along the entire length of the long read. The resulting bins of clustered, identical long sequences is the goal of amplicon sequence variant (ASV) or Clonotype analysis, which represents the true diversity present in a sample. While ASV analysis is the gold standard for short read amplicon sequencing, the recent advent of High Accuracy Long Reads (HALR’s) has facilitated ASV analysis for long-read technologies, which was previously inaccessible to long-reads due to their historically high error rates.
The starting point for bio-informatics algorithms for ASV analysis is a set long reads that have been demultiplexed, “denoised” to computationally correct sequencing errors, remove adapters, linkers and primers and infer the true variants. As a result, long read ASV analysis resolves diversity to single-nucleotide resolution accuracy over full length reads spanning up to thousands of bp’s.
Amplicon sequence variants are then made into a matrix containing the occurrence counts for each ASV in each sample, enabling a comparative, cross sample ASV or Clonotype analysis. The ASV sequence can represent bacterial genome sequence or antibody library sequenced and relevant information can be assigned to each ASV (e.g. taxonomy for microbes, CDR annotation for antibodies).
Long read ASV analysis is particularly useful for characterizing genetic variation in systems where critical diversity is written in multiple positions that are more than a short read length apart. Applications that benefit from this technology include microbiome species and strain identification (Callahan et al, 2020), ScFv and Fab antibody sequencing (Erand et al) and enzyme engineering where ASV’s can be used to identify bacterial species, antibody Clonotypes and enzyme variants, respectively, with great resolution. Popular and powerful methods for resolving ASV's include DADA2 (Callahan et al.,2020) and UNOISE (Edgar et al.,2016), which all aim at discerning true diversity from sequencing errors. And here you thought clustering algorithms were all the same.
Benjamin J Callahan, Dmitry Grinevich, Siddhartha Thakur, Michael A Balamotis, Tuval Ben Yehezkel (2020). Ultra-accurate Microbial Amplicon Sequencing Directly from Complex Samples with Synthetic Long Reads.
Edgar, R. C. (2016). UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. BioRxiv, 081257.
Erand Smakaj, Lmar Babrak, Mats Ohlin, Mikhail Shugay, Bryan Briney, Deniz Tosoni, Christopher Galli, Vendi Grobelsek, Igor D’Angelo, Branden Olson, Sai Reddy, Victor Greiff, Johannes Trück, Susanna Marquez, William Lees, Enkelejda Miho (2020) Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences Bioinformatics, Volume 36, Issue 6, 15 March 2020, Pages 1731–1739,