Skip to main content

Β· 7 min read Β· Reviewed by the DNA Info Lab editorial team

How to interpret your 23andMe raw data

If you've downloaded your raw DNA file from 23andMe (or AncestryDNA, or MyHeritage β€” the principle is the same), you have a plain text file with around 650,000 rows. Each row is a genetic position the testing chip measured. The natural question: what can you actually do with it?

This guide walks through three ways to interpret your raw data β€” from the simplest (looking up specific variants by hand) to the most useful (cross-referencing against published research). Plus the honest limits.

What's actually in the file

The 23andMe raw file is a tab-separated text file with four columns:

rsid          chromosome   position    genotype
rs4988235     2            136608646   AG
rs6025         1           169519049   TT
rs429358      19           45411941    CT
  • rsid β€” the unique ID of a genetic variant in dbSNP (e.g. rs429358 = the APOE variant).
  • chromosome β€” one of your 23 chromosomes (1-22, X, Y, MT).
  • position β€” the exact base-pair position on that chromosome.
  • genotype β€” the two letters you have at that position (one inherited from each parent).

That's all. It's not your full genome β€” it's a sample of ~650,000 positions out of 3 billion (~0.02%). But those positions are carefully chosen to cover most known disease-associated variants.

Method 1: Look up specific variants by hand

The most direct way is to look up specific rsids you've heard about. For each variant, you can check what your letters mean in three databases:

  • SNPedia β€” a wiki of consumer-relevant variants with plain-language summaries
  • dbSNP β€” the official NCBI database with all known variants
  • GWAS Catalog β€” peer-reviewed disease associations with effect sizes

Famous variants people look up

  • rs429358 + rs7412 β€” combine to define APOE alleles, the strongest common genetic risk factor for Alzheimer's.
  • rs6025 β€” Factor V Leiden, blood-clotting risk.
  • rs1801133 β€” MTHFR variant, folate metabolism (controversial β€” see SNPedia).
  • rs4988235 β€” LCT, lactose tolerance.
  • rs9939609 β€” FTO, obesity association.
  • rs1051730 β€” CHRNA3, nicotine dependence.
  • rs1815739 β€” ACTN3, the "sprint gene" (sport performance).

This is informative but slow β€” you can interpret maybe 5-10 variants in an evening, and you have to be careful about ancestry-specific effects and overinterpreting single SNPs.

Method 2: Cross-reference against curated databases

Several tools take your full raw file and automatically match it against curated, peer-reviewed databases. The three databases that matter most are:

  1. GWAS Catalog β†’ polygenic risk scores. Sums the small effects of hundreds of disease-associated variants for traits like type 2 diabetes, heart disease, Alzheimer's, depression.
  2. PharmGKB β†’ pharmacogenomics. Tells you how your genotype may affect drug response for warfarin, statins, codeine, clopidogrel, SSRIs, metformin and others.
  3. ClinVar β†’ clinical variants. Flags positions where you carry a variant labs have classified as pathogenic or likely pathogenic for a specific condition.

This is what our service does. Upload your raw file (zipped or extracted) and get back a structured report covering 11 trait categories with peer-reviewed sources and plain-language explanations. Free, takes 30 seconds.

Method 3: Whole-genome sequencing

If you want every position in your genome, microarray-based testing is not enough. Whole-genome sequencing services (Nebula, Dante Labs, Sequencing.com and others) deliver the complete picture for a few hundred euros. For most consumer questions, however, microarray data from 23andMe or AncestryDNA is more than sufficient β€” you only really need WGS if you're investigating rare variants in your family.

The big caveats

Before you go variant-hunting, three things to keep in mind:

  1. Most studies are biased toward European ancestry. ~90% of GWAS participants are of European descent. Risk multipliers you read about may not apply to you with the same magnitude.
  2. A single variant rarely means much. Even strong variants like APOE Ξ΅4 only modulate risk by 3-12Γ—. Most disease risk is polygenic β€” hundreds of small effects added together.
  3. This is not medical advice. Your raw data is a starting point for a conversation with a doctor, not a diagnosis. Anything that looks alarming deserves a clinical follow-up, not a panic.

Doing it the easy way

Manual lookup teaches you a lot but takes hours. If you just want the answer, upload your raw file to our free tool β€” we cross-reference it against the GWAS Catalog (~60,000 variants), PharmGKB and ClinVar, then explain each result in plain language. Equivalent to several hours of manual research, in under a minute.

Frequently asked questions

Can I open my 23andMe raw data file in Excel?β–Ό

Technically yes β€” it's a tab-separated text file. But Excel will choke on the ~650,000 rows on older machines and may auto-format columns (turning rsids that start with a number into dates, for example). For a one-off look it's fine; for analysis use a text editor like VS Code or a script.

Is it safe to upload my raw DNA data online?β–Ό

It depends on the service. Read their privacy policy before uploading anything. We process files in Europe (Germany), never sell or share your data, and you can export or delete it anytime via your account page. If a service can't answer "what happens to my data" in one sentence, don't upload there.

What's the difference between rs429358 and rs7412?β–Ό

Both are positions in the APOE gene. Together they define the Ξ΅2 / Ξ΅3 / Ξ΅4 alleles. You need both to interpret your APOE status β€” rs429358 alone is ambiguous because it can mean either Ξ΅3 or Ξ΅4 depending on what's at rs7412. Any decent tool combines them automatically.

How do I know which chip version 23andMe used for my sample?β–Ό

Look at the comments at the top of your raw file β€” 23andMe writes the chip version there (e.g., "v5" or "GSA-MD"). Newer chips include more medically-relevant variants and pharmacogenomic markers. The interpretation tools work with any version.

Will my report be different if I use AncestryDNA or MyHeritage data instead?β–Ό

Slightly different but very similar. 23andMe covers more pharmacogenomic and ClinVar variants; AncestryDNA covers more ancestry-informative markers. For polygenic risk scores both have enough overlap that the scores are practically identical. We auto-detect the format and adapt.

Related reading

Interpret your raw data automatically

Upload your 23andMe, AncestryDNA or MyHeritage file (zipped or extracted) and get a structured genetic report in minutes. Free, private, no kit needed.

Analyse my DNA β†’

This article is for educational purposes only and does not constitute medical advice, diagnosis or treatment. Always consult a healthcare professional for decisions about your health.