Upload strategy and model tips¶
How you upload your data and which model you choose both affect the quality and speed of your analysis. This page covers the tradeoffs to help you make the right choice.
Single vs. multiple VCF uploads¶
When working with multiple samples, you can upload a single multi-sample VCF or individual per-sample VCFs. Each approach has tradeoffs.
| Consideration | Single multi-sample VCF | Individual per-sample VCFs |
|---|---|---|
| Cross-sample queries | All samples in one table; mention families or sample names and AIVA scans a single table | Must tag each sample separately (@samples:s1, @samples:s2, ...) and AIVA joins them; tagging many samples is tedious and joins are slower |
| Per-sample analysis | Must specify sample name in genotype filters | Each sample is its own table; simpler per-sample queries |
| Upload and processing | One upload, one processing job | Multiple uploads, one per file |
| Query performance | Every query scans the full table (10M+ rows for WGS cohorts), even if you only need one sample | Each table is 100K to 5M rows, so per-sample queries are fast; cross-sample queries require joining multiple tables which adds overhead |
Split by sample
If you upload a multi-sample VCF and want per-sample tables, use the Split by sample checkbox during upload. This creates a separate table for each sample (up to 100 samples). See VCF Upload for details.
Recommendations¶
Use a single multi-sample VCF when:
- Cross-sample comparison is the primary goal (shared variants, segregation analysis, family-based queries)
- You want the convenience of referencing one table without tagging individual samples
- Your data is WES, where the combined table size is manageable
Use individual per-sample VCFs when:
- Each sample needs independent analysis
- You are working with WGS data, where scanning a single sample table (100K to 5M rows) is much faster than scanning a combined multi-sample table (10M+ rows)
- You want faster per-sample queries and are okay with the extra effort of tagging samples for cross-sample comparisons
Using individual VCFs with clinical data
When uploading samples individually, you can still use a clinical data file to help AIVA find the right tables. Name your VCF files consistently with the IDs in your clinical data. Then ask:
"@samples:clinical_data I'm interested in families 1 and 2. Can you find info about them along with the variant table names?"
AIVA will try to match sample IDs from the clinical data to your uploaded VCF table names.
Model selection tips¶
The default model is Claude Sonnet 4.6, which handles most queries well. For complex multi-step analysis, consider switching to a more capable model.
| Scenario | Recommended model |
|---|---|
| Simple counts, filters, and lookups | Claude Sonnet 4.6 (default, fast) |
| Multi-step prompt chains with many follow-ups | Claude Opus 4.6 or Gemini 3.1 Pro |
| Cross-referencing clinical + genomic data | Claude Opus 4.6 or Gemini 3.1 Pro |
| Large result set analysis | Gemini 3.1 Pro (1M token context window) |
| Exploratory questions | Start with Sonnet, switch to a more capable model when you find something worth investigating |
Claude Opus 4.6 and Gemini 3.1 Pro are thinking models that perform better on complex reasoning tasks. If you feel the default model is not giving you good results on a multi-step analysis, switching to one of these models often helps.
Switching models
You can switch models at any time during a conversation. Previous messages remain unchanged; only new messages use the selected model. See Model Selection for details.
Next steps¶
- Multi-sample analysis: Querying multi-sample VCFs for per-sample and cohort-wide results
- Clinical data integration: Cross-reference clinical metadata with variant data
- VCF Upload: Upload and configure your variant data