Structural variants (SVs) are genome changes that affect the organization of DNA and can cause carcinogenesis. SV detection is critical for cancer evaluation, treatment differentiation, and comprehension of cancer etiology. Nevertheless, due to biological challenges such as intratumor heterogeneity, polyploidy, and contamination from normal tissue, identifying SVs can be difficult. The article discusses several methods to identify somatic SVs in short-read whole-genome sequencing (WGS) data, as well as combinatorial algorithms that incorporate numerous read-alignment patterns, in addition to the significance of SV-level incorporation of multiple algorithms for increasing precision. Finally, the article discusses the possible effect of long-read sequencing and orthogonal sequencing methods on the identification of somatic SV.
Precise identification of somatic SVs in cancer genomes is critical to comprehending the genomic changes driving cancer development. To differentiate somatic from germline SVs, paired tumor-normal specimens are frequently utilized. For somatic SV identification, several algorithms, including DELLY, LUMPY, SvABA, Manta, and GRIDSS, have been established, each with a unique approach to detecting tumor-specific variants. Still, the analysis of tumor-normal paired specimens is complicated by cancer-specific challenges such as polyploidy, heterogeneity, and contamination. In order to account for these obstacles and enhance the precision of somatic SV identification, specialized tools such as Lancet and Varlociraptor have been established.
Calls for personalised testing for safety and effectiveness of common medicines throughout the NHS
Because somatic SVs are more complex than germline variations, detecting them in cancer genomes is a computational obstacle. Furthermore, the technical limitations of short-read WGS influence SV identification, especially in repeat-rich regions, which account for roughly half of the human genome. While Pacific Biosciences and Oxford Nanopore Technologies’ single-molecule long-read sequencing methods are useful for SV identification, they have drawbacks such as higher costs and lower nucleotide accuracy in comparison to short-read WGS. Despite these drawbacks, long-read sequencing has enhanced SV identification and offers additional advantages such as variant haplotype phasing and de novo assembly of complex rearrangements.
However, the development of trustworthy tools to evaluate long-read data is still ongoing, and incorporating multi-platform data could enhance SV identification accuracy and sensitivity. The combination of RNA sequencing and WGS data may be useful to identify gene fusions and link SVs to changed gene expression. Furthermore, bringing together WGS data from both short-read and long-read sequencing methods can overcome each technology’s drawbacks and provide an improved understanding of the genome. Finally, combining sequencing data with chromatin assays such as Hi-C and Bionano Genomics can aid in the detection of large and complex rearrangements.
Precision oncology requires the identification and interpretation of SVs in cancer genomes. However, due to technical and biological constraints, identifying tumor-specific SVs (TSSVs) from sequencing data is difficult. Short-read sequencing and targeted assays, for example, have limitations for identifying the entire spectrum of TSSVs. As a result, a multi-platform approach combining long-read and short-read sequencing data is required. Implementing these advances in clinical settings, on the other hand, necessitates added value, affordability, and faster turnaround times. Also, precision oncology requires consistency in the variant classification and prioritization. The combination of data from multiple platforms and detection tools allows TSSVs to be used in precision oncology and research their function in cancer.