The Data Management interface is used for high-throughput environments that enable the secondary and tertiary analysis of next generation sequencing data.
This Processing Tasks interface provides details into the completion of fastq to VCF files. To upload fastq files for secondary analysis, select Batch Upload or Upload Single Sample located in the top right corner.
Available fields in the Processing Tasks include:
- DATE: Date task was initiated
- SAMPLE ID: Sample ID
- SUBJECT: Subject
- BATCH: Title of the batch workflow
- STATUS: The status of the pipeline
- DETAILS: Provides details of the pipeline
- PIPELINE: Pipeline that was used for variant calling and annotation
- GENOME BUILD: Reference genome used for variant calling
The Upload Single Sample feature is used to import a single sample, using fastq files, for variant calling and VCF file generation. Once the VCF file is generated, the output will be available in the VCF Samples window of the dashboard. Listed are the steps for processing fastq files for a single sample.
Geneyx Analysis is a subject-centric platform and to begin the process of fastq to VCF, subject information should be entered.
- SUBJECT (required): A unique identifier for the subject. Commonly it would be the patient identifier in existing EMR or LIMS system, but it can be any identifier.
- NAME (optional): The name of the subject
- DATE OF BIRTH (optional): The date of birth of the subject
- GENDER (optional): Male / Female. Leave blank for unknown / unspecified
- CONSENT–PERSONAL DATA: Consent for personal data
- CONSENT–CLINICAL DATA: Consent for clinical data
- CONSANGUINITY (optional): Commonly used to specify parental consanguinity in cases of rare disease analysis
- ETHNICITY: Ethnicity of subject
- PATERNAL ANCESTRY: Paternal ancestry
- MATERNAL ANCESTRY: Maternal ancestry
- FAMILY HISTORY: Any additional family history
The Samples section is used to enter clinical information on the sample level and specify how many FASTQ files are available for the given sample.
- Serial Number: Specify a unique identifier for this analysis. A default value is automatically generated but can be replaced with your own identifier (for example, ID of the case in your system)
- Use consent: Give consent to be used by colleagues
- Expected # fastq files: Number of fastq files that are available for the sample
- Sequencing Target: NGS data type, includes:
- Whole Genome
- Gene Panel
- Target Region
- Clinical Exome
- Enrichment Kit: Defines the target capture regions. User can also implement their own BED file. Options include:
- Agilent SureSelect Clinical Research Exome V2
- Agilent SureSelect Human All Exon V5
- Agilent SureSelect Human All Exon V6 r2
- Agilent SureSelect Human All Exon V7
- Agilent V4
- Agilent V5
- Default – Exons only
- IDT xGen Exome Research Panel v1.0
- Illumina TruSeq Exome Targeted Regions Manifest v1.2
- NimbleGen EZ V2
- NimbleGen EZ V3
- Twist Human Core Exome
- Twist Human Core Exome Plus
- Taken Date: Date sample was obtained
- Sequence Date: Date sample was sequenced
- Received Date: Date sequencing data was received
- Sequencing Machine: The machine used to automate the DNA sequencing process
- Sample Source: Method DNA was collected
- Notes: Any additional comments
File Source indicates where the fastq files are located. Options include:
- Upload local files: This option should be used when the files are located on a local environment
- Fetch remote files: This option should be used when files are located on a cloud instance. If selected, a New Data Source can be entered, or Existing Data Source can be selected.
- Data Source: FTP, Amazon S3, BaseSpace
- Data Source Type: FTP, Amazon S3, BaseSpace
The last step in uploading a single sample is defining the Secondary Pipeline. The options available include:
- DRAGEN Exome- hg19
- DRAGEN Exome- hg38
- GATK Exome- hg19
- GATK Exome- hg38
- DRAGEN Whole Genome – hg38
- DRAGEN Whole Genome – hg38
After clicking Save, the following dialog will display a summation of the workflow.
Geneyx Analysis Data batch upload allows the upload of a TSV (tab-separated) file, which describes a set of samples that are available in external storage (FTP, Basespace, S3) and their associated sample and patient information. Once submitted, the system will download the samples and automatically start processing them.
To download a template with an example, click here.