# Document 397456

```talks Genotype Reﬁnement Pipeline Using addi4onal data to improve genotype calls and likelihoods Why Genotypes? •  Many analyses are only interested in variants •  Medical gene4cists are interested in genotypes –  Do any pa4ents have two copies of a LOF muta4on? –  Are the parents of a diseased child likely to have more aﬄicted children? Genotype Call Quality Is Important! •  Poor genotype calls for some samples –  Can have low conﬁdence –  Might be wrong call •  Can addi4onal (independent) data improve genotype calls? –  Use high quality data (like 1000G) as priors –  Use pedigree (if available) –  Calculate posterior genotype probabili4es We are here in the Best Practices workflow"
Genotype Reﬁnement Genotype Reﬁnement Pipeline Recalibrated Variants CalculateGenotypePosteriors Popula4on Priors Family Priors VariantAnnotator PossibleDeNovos CalculateGenotypePosteriors Popula4on Priors Family Priors VariantAnnotator PossibleDeNovos POPULATION PRIORS Bayes’s Rule Review •  Given that your coworker just walked in with an umbrella, what is the probability that it is raining? •  Observa4on = umbrella •  Θ = probability of rain posterior probability likelihood prior (normalize) CGP Corrects HOM_VAR Call with Low Frequency Priors Locus 20:3011778 1) Baseline HOM_VAR call"
2) Priors w/low allele
frequency applied"
3) Posterior genotype
called HET"
4) In agreement w/NIST"
and BAMs"
Likelihoods x Priors = Posterior Probabilities"
[895,3,0]"
AF=0.002"
[HOM_REF, HET, HOM_VAR]"
[868,0,27]"
[HOM_REF, HET, HOM_VAR]"
Genotype corrected"
Confidence improved
from Q3 to Q27"
CGP Corrects HET Call with High Frequency Priors Locus 8:3978061 1) Baseline HET call"
2) Priors w/high allele
frequency applied"
3) Posterior genotype
called HOM_VAR"
4) In agreement w/NIST"
and BAMs"
Likelihoods x Priors = Posterior Probabilities"
[894,0,0]"
AF=0.987"
[HOM_REF, HET, HOM_VAR]"
[932,16,0]"
[HOM_REF, HET, HOM_VAR]"
Genotype corrected "
Confidence improved"
from Q0 to Q16"
Popula4on Priors Improve Genotype Conﬁdence Hom Ref Baseline HR calls are under
confident, but posterior
calls are more accurate"
Het Baseline HV calls are over
confident, but posterior
calls are improved"
Hom Var Assessing Conﬁdence and Correctness Posterior Genotype Quality Homozygous Reference Calls Average Q10 increase for
correct calls, Q≤30"
Intercept = 9.9612,"
Slope = 0.9302"
the same"
Baseline Genotype Quality CalculateGenotypePosteriors Popula4on Priors Family Priors VariantAnnotator PossibleDeNovos FAMILY PRIORS Parental Genotypes Inform Child Genotypes •  Child can only inherit alleles present in parents •  Parent genotypes determine possible child genotypes (assuming no muta4ons) Child HR HR HR HR Mother HR HR HET HET Father HR HET HR HET Child HET HET HET HET HET HET HET Mother HET HR HET HV HET HR HV •  HaplotypeCaller gives •  Given trio data we can derive Father HET HET HR HET HV HV HR Child HV HV HV HV Mother HV HV HET HET Father HV HET HV HET Bayesian Priors Applied to Trios •  Recall Bayes’s Rule: •  Establish genotype conﬁgura4on probabili4es •  Apply family priors posterior likelihood apply prior normalize Family Priors Improve Genotype Conﬁdence Hom Ref Baseline HR calls are under
confident, but posterior
calls are more accurate"
Het Posterior HR and HV calls
are higher confidence"
Hom Var Assessing Conﬁdence and Correctness Posterior Genotype Quality Homozygous Reference Calls Average Q13 increase for
correct calls, Q≤30"
Intercept = 12.831,"
Slope = 1.238"
the same"
Baseline Genotype Quality CalculateGenotypePosteriors Popula4on Priors Family Priors VariantAnnotator PossibleDeNovos DE NOVO MUTATION TAGGING De Novo Muta4on Deﬁni4on •  De novos muta4ons are culprits is many rare, Mendelian disorders Parents are homozygous reference Child is het (one copy of alt allele) Proper4es of Sequenced De Novos •  Novelty –  Child has only alt allele in trio, not inherited •  Rarity –  Allele frequency across all samples sequenced is low •  Conﬁdence –  Set GQ threshold for parents and child –  (GQ improvement tools help A LOT here!) Priors Can Be Tuned For Sensi4vity Muta4on prior is a parameter in genotype conﬁgura4on probability: •  Sensi4vity and speciﬁcity can be tuned as in VQSR Genotype Reﬁnement Yields More High-­‐Quality Genotypes •  Ini4al genotype calls may be ambiguous or wrong •  Applying popula4on and family priors improves conﬁdence •  More high conﬁdence genotypes means more data for downstream analysis! Hom Var talks Further reading hgp://www.broadins4tute.org/gatk/guide/ hgp://www.broadins4tute.org/gatk/gatkdocs/