Annotations and Algorithms

Annotations #

Within each inheritance model are annotations and algorithms that help identify the candidate variants most highly associated with the patient’s phenotype. By default, the Geneyx Analysis window provides eight different categories with rich context for variant analysis.

Genomic and Genetic Data: This category is used to understand the variant nomenclature. Subcategories include:

  • LOCATION – genetic location according to the selected genetic assembly (GRCh37/38). By clicking the location , you will be directed to the IGV genome browser adjusted view. To view the UCSC genome browser, select the UCSC icon in the top right.
  • GENE – the gene of the specific variant. The model of inheritance is indicated: AR = Autosomal Recessive; AD = Autosomal Dominant; XL = X linked; DG = Digenic.
    • By clicking on the gene, you will be directed to view clinical information, pathways and drugs associated with the gene. This includes OMIM/ClinVar, Expression, Pathways, and Coverage. The following details are provided:
    • Tolerance RVIS: Residual Variation Intolerance Score, is a gene-based score to help in the interpretation of human sequencing data. The intolerance score in its current form is based upon allele frequency data as represented in whole exome sequence data from the NHLBI-ESP6500 data set. The score is designed to rank genes in terms of whether they have more or less common functional genetic variation relative to the genome wide expectation given the amount of apparently neutral variation the gene has. A gene with a positive score has more common functional variation, and a gene with a negative score has less and is referred to as “intolerant”. By convention all genes are ranked in order from most intolerant to least. As an example, a gene such as ATP1A3 has a RVIS score of -1.53 and a percentile of 3.37%, meaning it is amongst the 3.37% most intolerant of human genes.
    • GDI: Human Gene Damage Index is the accumulated mutational damage of each human gene in healthy human population, based on the 1000 Genomes Project database (Phase 3) gene variations of healthy individuals and of the CADD score for calculating impact. Highly damaged human genes are unlikely to be disease-causing. GDI is very effective to filter out variants harbored in highly damaged (high GDI) genes that are unlikely to be disease-causing.
    • GnomAD Missense Z-score: The pLI and Z-scores of the deviation of observed variant counts relative to the expected number are intended to measure how constrained or intolerant a gene or transcript is to a specific type of variation. Genes or transcripts that are particularly depleted of a specific class of variation (as observed in the gnomAD data set) are considered intolerant of that specific type of variation. Z-scores are available for the missense and synonynmous categories and pLI scores are available for the loss-of-function variation.
      • Missense and Synonymous: Positive Z-scores indicate more constraint (fewer observed variants than expected), and negative scores indicate less constraint (more observed variants than expected). A greater Z-score indicates more intolerance to the class of variation.
    • pLI : The pLI and Z-scores of the deviation of observed variant counts relative to the expected number are intended to measure how constrained or intolerant a gene or transcript is to a specific type of variation. Genes or transcripts that are particularly depleted of a specific class of variation (as observed in the gnomAD data set) are considered intolerant of that specific type of variation. Z-scores are available for the missense and synonynmous categories and pLI scores are available for the loss-of-function variation.
      • Loss-of-function: pLI closer to 1 indicates that the gene or transcript cannot tolerate protein truncating variation (nonsense, splice acceptor and splice donor variation). The gnomAD team recommends transcripts with a pLI >= 0.9 for the set of transcripts extremely intolerant to truncating variants.
    • OMIM/ClinVar: This interface will display gene-phenotype associations according to OMIM and the summary of pathogenic/likely pathogenic SNVs and CNVs in the gene according to ClinVar .
      • Phenotypes Associated with Gene: This will display the known conditions associated with the disease according to OMIM phenotypes and will provide the associated inheritance model and hyperlink to relevant documentation.
      • Summary of pathogenic/likely pathogenic SNVs in ClinVar: For the given conditions associated with the gene, the total counts of observed pathogenic or likely pathogenic single nucleotide variants in ClinVar will be displayed according to the associated effect.
      • Summary of pathogenic/likely pathogenic CNVs in ClinVar: For the given conditions associated with the gene, the total counts of observed pathogenic or likely pathogenic copy number variants in ClinVar will be displayed according to the associated effect.
  • Coverage: Displays the coverage of the gene, if applicable
  • REF – the allele present in the reference genome  
  • ALT – the alternative call for the sample 
  • AA – the amino acid change if applicable 
  • HGVS – Representation of the variant in HGVS (Human Genome Variation Society) nomenclature  
  • ZYG – the zygosity of the variant (heterozygous, homozygous, or hemizygous) 
  • REFSEQ – Transcript(s) relevant for the protein effect 
  • Exon # – The impacted exon of the gene
  • CODON – Codon change 
  • DBSNP – dbSNP ID 
  • DBSNP version – the earliest dbSNP version which included the variant 

ACMG: American College of Medical Genetics and Genomics (ACMG) developed guidance for the interpretation of sequencing variants. These recommendations are used to classify variants based on criteria using different types of variant evidence (e.g., population data, computational data, functional data, segregation data).

  • Dom – This is the ACMG classification for dominant inheritance
  • Rec – This is the ACMG classification for recessive inheritance

By clicking one of these options, you will be directed to the full ACMG classification panel. The left side presents the different categories of the ACMG guidelines, and the middle dialog displays the ACMG criteria. If an ACMG criteria is applicable, it will be solid. Criteria that are empty indicate that there is not sufficient evidence for it to be applied. Crossed out criterion indicates that it is not applicable, or the variant evidence suggests against the criteria. Hovering over the individual criterion will provide more details of the criteria and clicking on the cell will manually override the autoclassification. The supporting evidence for the criteria is available on the right-hand side.

ACMG interface for variant classification

Variant Calling Q&R: This category will display information that is derived from the VCF file.

  • Q&R: Classifies calls by quality. Low = coverage < 10x and GQ < 15; Med = coverage < 20x and GQ < 50; High = coverage ≥20x and GQ ≥ 50. 
  • Depth: Read depth (total reads) 
  • DP4: Number of 1) forward reference alleles; 2) reverse reference alleles; 3) forward non-reference alleles; 4) reverse non-reference alleles
  • %ALT: The percentage of reads showing the alternative allele 
  • GQ: The variant calling quality score of the base at the variant location 
  • FILTER: Quality filter status based on the variant caller 
  • PL Phred-scaled genotype likelihoods (HOM REF, HET, HOM ALT) 
  • AMP SCORE: Amplification score (coverage/median coverage) 

Evidence: This category displays the clinical evidence associated with the variant based on annotation sources.

  • PHENO: The score of the association between the selected phenotypes and the gene. This algorithm utilizes the Geneyx Knowledgebase and Geneyx Phenotyper (Pheneyx), which consolidates major clinical data sources (OMIM, ClinVar, OrphaNet and HPO) and uses advanced matching capabilities with direct and indirect associations between genes and biomedical information.
  • MATCHED PHENOTYPES: The number of matching terms to the selected phenotypes.  
  • CLINVAR: Clinical significance of matching ClinVar entries.
  • OMIM: Displays disorders known in Online Mendelian Inheritance in MAN
  • CLNAA: Clinical significance of matching ClinVar entries based on amino acid change. Click to view clinical relevance and summaries for pathogenic entries.
  • CIVIC: Clinical Interpretation of Variants in Cancer
  • PUBS: Number of relevant publications at MasterMind 

IN HOUSE: Displays the frequency of the variant observed across samples observed internally.

  • V: In-house annotation for this variant
  • G: In-house annotation for this gene
  • AF (%): The allele frequency of the variant in all samples of the current account.

Effect & Prediction: Displays information relevant to the effect of the mutation including functional and conservation annotations.

  • EFFECT: Variant’s effect on splicing and protein  
  • SEV: Protein’s impact severity, based on the effect and predicted damage. High: nonsense, frameshift, splicing site, missense (REVEL ≥ 0.75 or all prediction tools classify as ‘damaging’); Med: codon indels, stop loss, missense (REVEL ≥ 0.5 or at least one prediction tools classify as ‘damaging’); Low: synonymous, splice site region, missense (REVEL < 0.5 or none of the prediction tools classify as ‘damaging’) 
  • REVEL: Rare Exome Variant Ensemble Learner score for prediction of missense variants pathogenicity. Scores range from 0 to 1 and variants with higher scores are more likely pathogenic 
  • PHYLOP: measures evolutionary conservation at individual alignment sites
  • GERP_NR: Nucleotide conservation score (Neutral Rate) 
  • GERP_RS: Nucleotide conservation score (Rejected Substitution) – Positive scores represent a substitution deficit (i.e., fewer substitutions than the average neutral site) and thus indicate that a site may be under evolutionary constraint 
  • LRT PRED: Likelihood Ratio Test. Predicts the impact of non-synonymous mutations on the protein (N = Neutral; D = Deleterious; U = Uncertain) 
  • MUTTASTER: Predicted impact of missense variants on the protein based on ‘Mutation Taste’ (D = Disease causing – Deleterious; A = Disease causing automatic – known to be deleterious; N = Polymorphism – probably harmless; P = Polymorphism Automatic – known to be harmless) 
  • POLYPHEN2 HDIV/HVAR: Predicted impact of missense variants on the protein based on ‘Polyphen2’ (D = Damaging; P = Probably Damaging; B = Benign). “HumDiv” is the default Classifier model used by probabilistic predictor; it is preferred for evaluating rare alleles, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection. “HumVar” is better suited for diagnostics of Mendelian diseases, which requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. 
  • SIFT: Predicted impact of missense variants on the protein based on SIFT (Sorting Intolerant From Tolerant). The score is the normalized probability that the amino acid change is tolerated so scores nearer zero are more likely to be deleterious. Substitutions with a score < 0.05 are called ‘deleterious’ and all others are called ‘tolerated’. 
  • UNIPROT: The accession(s) # of the protein isoform(s) in UniProt. 
  • ADA SCORE: dbscSNV ADA score, a comprehensive database of all potential human SNVs within splicing consensus regions and their functional annotations
  • RF SCORE: dbscSNV Random Forests (RF) splicing score 

Frequency: Displays the frequency across different population databases.

  • MAX AF (%) : Maximum observed allele frequency in 1000 Genomes, ESP and gnomAD  
  • #HOM : The count of samples with this homozygous allele in gnomAD  
  • #HET : The count of samples with this heterozygous allele in gnomAD 
  • #HEMI : The count of samples with this hemizygous allele in gnomAD 
  • 1K GENOME AF : The allele frequency of the variant in 1000 Genome 
  • ESP African AF : The allele frequency of the variant in ESP (Exome Sequencing Project) African Americans population 
  • ESP European AF % : The allele frequency of the variant in ESP (Exome Sequencing Project) European population 
  • DBSNP MAF % : The minor allele frequency of the variant in dbSNP 
  • HAN AF % : The allele frequency of the variant in CONVERGE project for Han Chinese population 
  • GNE AF : The gnomAD Exomes allele frequency 
  • GNG AF : The gnomAD Genomes allele frequency 
  • GNE AMR : The gnomAD Exomes Latino/Admixed American allele frequency 
  • GNG AMR : The gnomAD Genomes Latino/Admixed American allele frequency 
  • GNE AFR : The gnomAD Exomes African/African American allele frequency  
  • GNG AFR : The gnomAD Genomes African/African American allele frequency 
  • GNE ASJ : The gnomAD Exomes Ashkenazi Jewish allele frequency 
  • GNG ASJ : The gnomAD Genomes Ashkenazi Jewish allele frequency 
  • GNE EAS : The gnomAD Exomes East Asian allele frequency 
  • GNG EAS : The gnomAD Genomes East Asian allele frequency 
  • GNE FIN: The gnomAD Exomes European (Finnish) allele frequency 
  • GNG FIN: The gnomAD Genomes European (Finnish) allele frequency 
  • GNE NFE: The gnomAD Exomes European (non-Finnish) allele frequency 
  • GNG NFE: The gnomAD Genomes European (non-Finnish) allele frequency 
  • GNE OTH: The gnomAD Exomes Other allele frequency 
  • GNG OTH: The gnomAD Genomes Other allele frequency 
  • GNE SAS: The gnomAD Exomes South Asian allele frequency 
  • GNG SAS: The gnomAD Genomes South Asian allele frequency 

CNV Annotations #

The copy number variant (CNV) workflow offers unique insights into CNV analysis. This includes a variety of annotation sources and visualization capabilities, as well as the ability to deep dive into the individual genes of the event.

GENOMIC AND GENETIC DATA: This category identifies the size of the structural variation.

  • LOCATION: Genetic location of the structural variation. Selecting the location will provide details in the IGV genome browser.
  • LEN: Length of the structural variation
  • START: Starting position of the structural variation
  • END: End position of the structural variation

EFFECT & PREDICITON: Displays information relevant to the effect of the mutation.

  • EFFECT: The effect of the variant: insertion (INS), deletion (DEL), duplication (DUP)
  • ZYG: The genotype call in the proband sample: heterozygous (HET), homozygous (HOM), hemizygous (HEMI)
  • GENES: Number of genes affected by the event
  • M.EXONS: Maximum number of exons affected within genes
  • ENHS: Number of enhancers impacted by this event
  • REPEAT: If it is a repeat variant, information about the repeat will be presented as follows: (# in wild-type) : (repeat sequence) : (# in allele 1) /(# in allele 2)

EVIDENCE: This category displays the clinical evidence associated with the variant based on annotation sources.

  • PHENO (MAX): The score of the association between the selected phenotypes and the gene. This algorithm utilizes the Geneyx Knowledgebase and Geneyx Phenotyper (Pheneyx), which consolidates major clinical data sources (OMIM, ClinVar, OrphaNet and HPO) and uses advanced matching capabilities with direct and indirect associations between genes and biomedical information.
  • SCORE: Structural variation quality score

FREQUENCY: Frequency of the structural variation in population databases.

  • MAX AF (GAIN): Maximum allele frequency for CNV duplications in reference populations
  • MAX AF (LOSS): Maximum allele frequency for CNV deletions in reference populations
  • DGV GAIN #: Counts of events which overlap CNV duplications in DGV (Database of Genomic Variants)
  • DGV GAIN AN: Counts of CNV duplications that overlap this event in DGV
  • DGV GAIN AF: Allele frequency of CNV duplications that overlap this event in DGV
  • DGV LOSS#: Counts of CNV deletions which overlap this event in DGV
  • DGV LOSS AN: Counts of CNV deletions which overlap this event in DGV
  • DGV LOSS AF: Allele frequency of CNV deletions which overlap this event in DGV
  • DDD DUP AN: Counts of duplication events that overlap this event in Deciphering Developmental Disorders (DDD)
  • DDD DUP AF: Allele frequency of CNV duplications that overlap this event in DDD
  • DDD DEL AN: Counts of CNV deletions which overlap this event in DDD
  • DDD DEL AF: Allele frequency of CNV deletions which overlap this event in DDD
  • GD AN: Counts of events which overlap this event in gnomAD
  • GD AF: Allele frequency of events which overlap this event in GD
  • IMH AF: Allele frequency of events which overlap this event in IMH
Annotations available for CNV analysis

CNV Workflow #

Once a CNV of interest has been identified, an investigation can be performed on the individual gene level. Once a CNV has been selected, the bottom window will update to display the affected genes.

CNV analysis

The information on a gene level includes:

  • GENE: Displays impacted gene. Click on the gene will display clinical information, pathways, and drugs associated with this gene
  • OVERLAP: Displays if event is a full or partial overlap of the gene
  • PHENO: The score of the association between the selected phenotypes and the gene. This algorithm utilizes the Geneyx Knowledgebase and Geneyx Phenotyper (Pheneyx), which consolidates major clinical data sources (OMIM, ClinVar, OrphaNet and HPO) and uses advanced matching capabilities with direct and indirect associations between genes and biomedical information.
  • MATCHED PHENOTYPES: The number of matching terms and the relevant list of phenotype terms associated with the gene
  • #EXONS: Number of exons affected within this gene by this site
  • ENH SCORE: Enhancer confidence score
  • ENH-GENE SCORE: Gene-Enhancer associated score
  • CLINVAR: Clinical significance of matching ClinVar entries. Click to view releveance and summary for pathogenic entries
  • #GNF: Features in gnomADSV which fully overlap this gene/enhancer
  • NUMOVERLAPGENEPARTIAL: Number of CNVs that partially overlap this gene

The CNV event can also be visualized using the IGV genome browser, which can be expanded using the icon in the upper right corner. If preferred, this event can also be plotted in the UCSC browser using the hyperlink in the upper right corner.

IGV browser

Selecting Variants for Reporting #

To the left of each variant row there are three annotation fields that can be defined per candidate variant: Relevance, Pathogenic, Notes.

Annotation fields are available for reporting

Clicking on Relevance will provide the user the ability to categorize the relevance of the variant and incorporate additional information to be rendered in the report. This window includes:

  • Inheritance Model: Recessive or Dominant
  • Relevance: High (1), Medium (2), Low (3), and Not Relevant
  • Pathogenicity: Pathogenic, Likely Pathogenic, Uncertain Significance, Likely Benign, Benign, Risk Factor
  • Validation: Requires Validation, Positive Validation, Negative Validation
  • Notes: Free text layout
  • Phenotype and Evidence: Allows selection of relevant disorders from OMIM and associated evidence to be selected and rendered into the report. Selecting the check box to the right of the entry will incorporate the information into the report.
    • OMIM: This will display the known conditions associated with the disease according to OMIM phenotypes and will provide the associated inheritance model and hyperlink to relevant documentation.
    • Evidence: This will show diseases, gene summaries, publications, and related variant and analysis terms for the given gene.
      • Diseases related to gene: This advance search identifies all publications present in OrphaNet, OMIM, UniProt, PubMed that contain conditions matching the entered phenotype and the given gene. Gene Summaries: Gene summaries are provided for the given gene. If a phenotypic term is matched, it will be highlighted in the text.
      • Publications related to gene: This will show publications in ClinVar and Entrez that are related to the given gene.
      • Variants related to gene: This will show all variants in ClinVar and OMIM that are present in the ATR gene.
      • Analysis terms related to gene: Terms that are rarely related to the phenotype entered.
The Relevance interface to include information into the report

Once information has been entered, the fields will be populated in the variant table accordingly.

Populated fields with relevant information displayed in the variant table

Creating A Report #

Once the clinically relevant variants have been selected and the relevance score and related notes has been applied, the next step is to review the findings using the ‘Report Preview’ button on the top right.

To view a mock report, click the green Report Preview icon in the top right

This will generate a visual summary of all the findings in addition to the general info. Here the user can choose which of the evidence sections presented will enter the final report. 

Mock report generated from Report Preview

From the Report Preview you can generate the report by clicking Save. This will render a full report in pdf format with all the information and supporting evidence   including publications and more, and an Excel file displaying the full variant tables that were analyzed. 

Report rendered in PDF and excel


Powered by BetterDocs