Secondary Analysis¶

Secondary analysis takes raw sequencing reads (FASTQ files) and runs a GPU-accelerated variant calling pipeline powered by NVIDIA Parabricks. The pipeline calls SNVs and indels, with optional add-ons for copy number variation (CNV) and structural variant (SV) calling. All results are automatically loaded into AIVA for analysis, along with BAM files for visual review in IGV.

Subscription required

Secondary analysis is available on Pro and Enterprise tiers. Alignment (FASTQ to BAM) costs 3 credits (WES) or 4 credits (WGS). Enabling SNV/Indel calling adds 2 (WES) or 3 (WGS) credits. Each optional add-on (PGx, CNV, SV) adds 1 credit. See Credit costs for a full breakdown. Pipeline outputs (tables and BAMs) count toward your tier's storage slots.

What the pipeline produces¶

Output	Description
Small variants (VCF)	SNVs and indels, automatically loaded into AIVA for analysis. A VCF download link is available on the sample card.
BAM	Aligned reads for visual review via IGV links
CNV calls	Copy number variation calls (optional add-on), annotated and loaded into AIVA
SV calls	Structural variant calls (optional add-on), annotated and loaded into AIVA
PGx star alleles	Pharmacogenomic star-allele assignments for key pharmacogenes

Storage slot usage

Each pipeline output counts as a storage slot. A germline pipeline uses 2 slots (1 table + 1 BAM). A somatic tumor-only pipeline uses 2 slots (1 table + 1 BAM). A somatic paired pipeline uses 3 slots (1 table + 2 BAMs). Check your available slots on the Subscription Tiers page.

Pipeline workflow¶

graph LR
    A[FASTQ Files] --> B[Parabricks Secondary Analysis]
    B --> C[BAM]
    C --> D[SNV/Indel Calling]
    C --> E[PGx Analysis]
    C --> J[CNV Calling]
    C --> K[SV Calling]
    D --> F[Annotation]
    F --> H[Ready for Analysis]
    E --> I[Star Alleles & Recommendations]
    I --> H
    J --> L[CNV Annotation]
    L --> H
    K --> M[SV Annotation]
    M --> H

Download: FASTQ files are downloaded from the provided cloud URLs.
Parabricks secondary analysis: Reads are aligned to the reference genome and small variants (SNVs and indels) are called using GPU-accelerated tools (DeepVariant for germline, DeepSomatic for tumor workflows). Read phasing is automatically enabled for germline and somatic paired pipelines, adding haplotype information to all variant calls. Aligned reads are saved as BAM files with IGV links for visual review.
Annotation + downstream analysis (parallel): Once Parabricks completes, multiple processes run in parallel:
- Annotation: Small Variant Annotation enriches the output VCF, which is then parsed and loaded into the AIVA database.
- PGx analysis: BAM files are used to assign pharmacogenomic star alleles, predict metabolizer phenotypes, and generate CPIC drug recommendations for 88 pharmacogenes. See Pharmacogenomics.
- CNV calling (optional): Copy number variation analysis. Results are annotated and loaded into AIVA.
- SV calling (optional): Structural variant calling. Results are annotated and loaded into AIVA.

You can monitor each stage in real time using the Job Manager.

Starting a pipeline run¶

Navigate to the Samples tab, click Upload, and select the Secondary Analysis tab.

Step 1: Name your sample¶

Enter a Sample Name. This is the name that will appear in your sample list and is used for @samples: mentions in AIVA Chat.

Step 2: Configure the pipeline¶

Setting	Options	Description
Analysis Type	Germline, Somatic Paired, Somatic Tumor-Only	Germline for inherited variant analysis. Somatic for cancer workflows (paired tumor-normal or tumor-only).
Sequencer	Illumina, PacBio, ONT, Ultima, MGI, Element	Both short-read and long-read sequencing platforms are supported.
Data Type	WGS (Whole Genome), WES (Whole Exome)	Determines pipeline parameters. WES runs may require an interval file.
Reference	GRCh38 (hg38)	The reference genome build for alignment.
SNV/Indel Calling	Enabled (default), Disabled	When disabled, only BAM files and raw VCF are produced.
CNV Calling	Off (default), On	Optional add-on for copy number variation analysis (+1 credit).
SV Calling	Off (default), On	Optional add-on for structural variant calling (+1 credit).

Optionally assign the sample to a Project for team collaboration.

Step 3: Provide FASTQ files¶

Enter the cloud storage URLs for your Read 1 (R1) and Read 2 (R2) FASTQ files. Supported URL schemes:

Scheme	Provider
`gs://`	Google Cloud Storage
`s3://`	Amazon S3
`az://`	Azure Blob Storage
`https://`	Any publicly accessible URL

For paired-end sequencing, both R1 and R2 URLs are required. Click + Add Lane if your sample was sequenced across multiple flow cell lanes.

Files must be publicly accessible

AIVA downloads the FASTQ files from the URLs you provide. The files must be publicly readable so that AIVA's server can access them. Private or authenticated URLs will fail with a download error in the Job Manager.

Step 4: Optional Interval File¶

Interval File: Upload a BED file to restrict analysis to targeted regions (recommended for WES data).

Step 5: Start the pipeline¶

Click Start Pipeline to submit. The pipeline run is queued and processed in the background.

Somatic analysis¶

Somatic paired (tumor-normal)¶

Select Somatic Paired as the analysis type for matched tumor-normal samples. You will need to provide FASTQ files for both the tumor and normal samples, each labeled with the appropriate role.

Somatic tumor-only¶

Select Somatic Tumor-Only when a matched normal sample is not available. Only tumor FASTQ files are required.

Supported FASTQ formats¶

Standard FASTQ format (.fastq, .fastq.gz, .fq.gz)
Gzip-compressed files are recommended to reduce download time
Both single-end and paired-end reads are supported

Troubleshooting¶

My pipeline run failed. What should I do?

Open the Job Manager and check the error message on the failed job. Common causes include:

Inaccessible FASTQ URLs: Verify the URLs are valid and the files are accessible from the AIVA server.
Corrupted FASTQ files: Ensure the files are valid FASTQ format and not truncated.
Mismatched R1/R2 files: For paired-end data, the R1 and R2 files must contain the same number of reads in the same order.

How long does a pipeline run take?

Processing time depends on file size and data type. A typical 30x whole-genome sample takes approximately 2 hours with GPU acceleration. Whole-exome samples are faster due to the smaller target region.

Can I run multiple pipeline jobs at once?

Jobs are queued and processed based on available GPU resources. You can submit multiple jobs, and they will be processed in order.