Overview
Long-read sequencing technology has enabled the detection and characterization of structural variations (SVs) at a higher resolution than ever before, including deletions, insertions, duplications, inversions, and translocations, with significant impacts on gene expression and
disease development. The technology generates tens of thousands of base pairs long-reads, allowing for the detection of larger SVs that may be missed by short-read sequencing. Although challenges such as the high error rate remain, the benefits of haplotype-resolved assemblies and accurate detection of SVs have already led to new insights into the genomic basis of diseases such as cancer, autism, and schizophrenia. Long-read sequencing has the potential to greatly improve our understanding of health and disease.
PacBio is an example of a long-read sequencing technology that can be used to elucidate structural variants and their impact on colorectal cancer. Colorectal cancer (CRC) is one of the most common cancers impacting millions of individuals worldwide and is associated with high mortality rates. Structural variants are important alterations in CRC as they can affect gene expression and function through gene amplification or deletion, influencing gene structure, and result in gene fusions. These alterations can subsequently impact oncogenes or tumor suppressors leading to CRC malignancy; however, short-read sequencing lacks specificity in accurately detecting these events as well as long, complex structural variants, which can also be in repetitive regions. Long-read sequencing has filled the gap and continues to identify novel findings in CRC as well as other common cancers.
(Image) IGV screenshot showing methylation data obtained through long-read sequencing
Take a look at Geneyx Analysis:
Our end-to-end solution for next-generation sequencing (NGS) data
Introduction Past and Present: Long Read for Detection of Structural Variants
Long-read sequencing technology has reshaped the research landscape by allowing for the generation of high-quality, contiguous, and accurate genome assemblies. Therefore, enabling the detection and characterization of structural variations at a much higher resolution than was previously possible. Structural variations are genomic alterations that involve the rearrangement of DNA segments, including deletions, insertions, duplications, inversions, and translocations. These alterations can have significant impacts on gene expression and disease development.
Long-read sequencing can detect SVs due to its ability to generate reads that are tens of thousands of base pairs in length. This allows for the detection of larger SVs that may be missed by short-read sequencing, which typically generates reads that are only a few hundred base pairs in length. Additionally, long-read sequencing can provide haplotype-resolved assemblies, which enables the accurate identification and characterization of SVs in a single genome. The ability to accurately detect SVs using long-read sequencing has already led to new insights into the genomic basis of a range of diseases, including cancer, autism, and schizophrenia.
For example, recent studies have identified large-scale structural variations that are associated with certain types of cancer and may represent potential therapeutic targets. Overall, long-read sequencing has the potential to greatly improve our understanding of the role of structural variations in health and disease.
The Technology Behind Long-Read Sequencing
The most widely used long-read sequencing technology is currently provided by Oxford Nanopore Technologies (ONT), a leading manufacturer of nanopore-based sequencing devices. One of the key advantages of ONT’s long-read sequencing technology is its ability to generate reads that are tens of thousands of base pairs in length, allowing for the detection of larger structural variations that may be missed by short-read sequencing. Additionally, ONT’s
technology can produce haplotype-resolved assemblies, which can help to accurately identify and characterize SVs in a single genome.
However, long-read sequencing technology is not without its challenges. For example, the high error rate of long-read sequencing can make it difficult to distinguish true structural variations from sequencing errors.
Performance Characteristics of Long-Read Sequencing for SV Detection
Description of the accuracy, sensitivity, and specificity of long-read sequencing for SV detection.
- Read length: Long-read technologies can generate reads that are tens of thousands of
base pairs in length - Accuracy: Long-read technologies have lower per-base error rates than short-read
technologies, which can lead to more accurate assemblies and variant detection. - Throughput: Long-read technologies typically have lower throughput than short-read
technologies, although this is improving with newer technologies such as Oxford
Nanopore’s PromethION.
Utilizing The Advantages of Long-Read Sequencing for Enhanced Understanding of Genomic Diversity and Disease
Population genomics: Long-read sequencing is being used to generate high-quality reference genomes for diverse populations, which can aid in the discovery of disease-associated genetic variants.
The detection of SVs using long-read sequencing data involves several steps.
First, the long-read sequences must be aligned to a reference genome, or de novo assembled to generate a new reference. Then, SV detection tools can be used to identify differences between the long-read sequences and the reference genome. These tools can identify several types of SVs, including insertions, deletions, inversions, and translocations.
A popular tool for detecting SVs is called Sniffles, which uses long-read sequencing data to identify SVs by clustering reads with similar breakpoints. Other tools such as SVIM, PBSV, and Manta also use long-read sequencing data to detect SVs, and each tool has its own strengths and weaknesses.
Detecting structural variants for pan-genomics is a rapidly evolving field, and long-read
sequencing technologies have revolutionized our ability to detect SVs. As more genomes are sequenced, new SV detection tools are likely to be developed, and we can expect to gain a better understanding of the role of SVs in human health and disease.
Genomic Diversity Researchers are utilizing the advantages of long-read sequencing to further enhance our understanding of genomic diversity and disease.
Conclusion: Future Directions for Research Utilizing Long-Read Sequencing for SV Detection and Analysis
As long-read sequencing technologies continue to advance, the detection and analysis of
structural variations (SVs) in the human genome will become increasingly precise and accurate.
One area of research that is poised to benefit greatly from these advances is cancer genomics,
as SVs are a common feature of many types of cancer and can provide important insights into
cancer development and progression. Moving forward, research utilizing long-read sequencing for SV detection and analysis will likely focus on several key areas, including:
Development of improved bioinformatics tools for SV analysis: As the amount of long-read sequencing data generated increases, there is a need for more efficient and
accurate bioinformatics tools for SV detection and analysis.
Integration of long-read and short-read sequencing data: Combining long-read and short-read sequencing data can provide a more comprehensive view of the genome and improve the accuracy of SV detection and analysis.
Application of long-read sequencing to larger cohorts: As the cost of long-read
sequencing continues to decrease, it will become more feasible to apply these
technologies to larger cohorts, enabling the identification of rare SVs and the discovery of novel disease-associated genetic variants.
Geneyx supports long-read sequencing, which is a method of DNA sequencing that produces longer reads than traditional short-read sequencing.
Long-read sequencing technologies, such as PacBio and Oxford Nanopore, can generate reads that are tens of thousands of bases long, making it possible to sequence through complex regions of the genome that were previously inaccessible. Geneyx provides bioinformatics analysis services to help researchers interpret the data generated by long-read sequencing technologies. This includes services such as genome assembly, variant calling, and structural variant analysis.
Geneyx’s platform offers several advantages for researchers using long-read sequencing
technologies. For example, the platform can handle large datasets with ease and provide rapid turnaround times for analysis. Additionally, the platform offers a user-friendly interface and advanced visualization tools that make it easy for researchers to explore and interpret their data.
Overall, the future of research utilizing long-read sequencing for SV detection and analysis is promising, and we can expect continued advancements in this field to have significant impacts on our understanding of human genetics and disease.
Geneyx supports Long-Reads on Oxford Nanopore Technology – Read more about this collaboration