A method and server for predicting damaging missense mutations

Adzhubei, Ivan A; Schmidt, Steffen; Peshkin, Leonid; Ramensky, Vasily E; Gerasimova, Anna; Bork, Peer; Kondrashov, Alexey S; Sunyaev, Shamil R

doi:10.1038/nmeth0410-248

Correspondence
Published: April 2010

A method and server for predicting damaging missense mutations

Ivan A Adzhubei¹^na1,
Steffen Schmidt²^na1,
Leonid Peshkin³^na1,
Vasily E Ramensky⁴,
Anna Gerasimova⁵,
Peer Bork⁶,
Alexey S Kondrashov⁵ &
…
Shamil R Sunyaev¹

Nature Methods volume 7, pages 248–249 (2010)Cite this article

27k Accesses
9534 Citations
37 Altmetric
Metrics details

Subjects

You have full access to this article via your institution.

Download PDF

To the editor:

Applications of rapidly advancing sequencing technology exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon-capture techniques will direct sequencing efforts to the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow.

Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/, Supplementary Software), for predicting damaging effects of missense mutations. PolyPhen-2 is different from the earlier tool PolyPhen¹ in the set of predictive features, the alignment pipeline and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1), which were selected automatically by an iterative greedy algorithm (Supplementary Methods). The majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele. The alignment pipeline selects a set of homologous sequences using a clustering algorithm and then constructs and refines its multiple alignment (Supplementary Fig. 1). The most informative predictive features characterize how likely the two human alleles are to occupy the site given the pattern of amino-acid replacements in the multiple-sequence alignment; how distant the protein harboring the first deviation from the human wild-type allele is from the human protein; and whether the mutant allele originated at a hypermutable site2. The functional importance of an allele replacement is predicted from its individual features (Supplementary Figs. 2, 3, 4) by a naive Bayes classifier (Supplementary Methods).

**Figure 1: PolyPhen-2 pipeline and prediction accuracy.**

We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles annotated in the UniProt database as causing human Mendelian diseases and affecting protein stability or function, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be nondamaging (Supplementary Methods). The second pair, HumVar³, consists of all the 13,032 human disease-causing mutations from UniProt and 8,946 human nonsynonymous single-nucleotide polymorphisms (nsSNPs) without annotated involvement in disease, which we treated as nondamaging.

We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to that of PolyPhen (Fig. 1b) and it also compared favorably with that of three other popular prediction tools^4,5,6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieved true positive prediction rates of 92% and 73% on HumDiv and HumVar datasets, respectively (Supplementary Table 2).

One reason for the lower accuracy of predictions on HumVar is that nsSNPs assumed to be nondamaging in the HumVar dataset included a sizable fraction of mildly deleterious alleles. In contrast, most amino-acid replacements assumed nondamaging in the HumDiv dataset must be close to selective neutrality. Because alleles that are mildly but unconditionally deleterious may not be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which were assigned to opposite categories in HumVar data. Another reason is that the HumDiv dataset uses extra criteria (Supplementary Methods) to avoid possible erroneous annotations of damaging mutations.

PolyPhen-2 calculates the naive Bayes posterior probability that a given mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact nondamaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging or probably damaging (Supplementary Methods).

The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases require distinguishing mutations with drastic effects from other human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used to evaluate rare alleles at loci potentially involved in complex phenotypes, for dense mapping of regions identified by genome-wide association studies and for analysis of natural selection from sequence data, in which even mildly deleterious alleles must be treated as damaging.

Note: Supplementary information is available on the Nature Methods website.

References

Ramensky, V., Bork, P. & Sunyaev, S. Nucleic Acids Res. 30, 3894–3900 (2002).
Article CAS Google Scholar
Schmidt, S. et al. PLoS Genet. 4, e1000281 (2008).
Article Google Scholar
Capriotti, E., Calabrese, R. & Casadio, R. Bioinformatics 22, 2729–2734 (2006).
Article CAS Google Scholar
Ng, P.C. & Henikoff, S. Nucleic Acids Res. 31, 3812–3814 (2003).
Article CAS Google Scholar
Bromberg, Y., Yachdav, G. & Rost, B. Bioinformatics 24, 2397–2398 (2008).
Article CAS Google Scholar
Yue, P., Melamud, E. & Moult, J. BMC Bioinformatics 7, 166 (2006).
Article Google Scholar

Download references

Acknowledgements

We thank Y. Bromberg for help with the SNAP analysis. V.E.R. acknowledges support by the Russian Academy of Sciences Program in Molecular and Cellular Biology. This work was supported by the US National Institutes of Health (R01 GM078598 and in part by R01 MH084676).

Author information

Ivan A Adzhubei, Steffen Schmidt and Leonid Peshkin: These authors contributed equally to this work.

Authors and Affiliations

Division of Genetics, Brigham & Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
Ivan A Adzhubei & Shamil R Sunyaev
Department of Biochemistry, Max Planck Institute for Developmental Biology, Tübingen, Germany
Steffen Schmidt
Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA
Leonid Peshkin
Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
Vasily E Ramensky
Life Sciences Institute and Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, USA
Anna Gerasimova & Alexey S Kondrashov
European Molecular Biology Laboratory, Heidelberg, Germany
Peer Bork

Authors

Ivan A Adzhubei
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Leonid Peshkin
View author publications
You can also search for this author in PubMed Google Scholar
Vasily E Ramensky
View author publications
You can also search for this author in PubMed Google Scholar
Anna Gerasimova
View author publications
You can also search for this author in PubMed Google Scholar
Peer Bork
View author publications
You can also search for this author in PubMed Google Scholar
Alexey S Kondrashov
View author publications
You can also search for this author in PubMed Google Scholar
Shamil R Sunyaev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shamil R Sunyaev.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4, Supplementary Tables 1–2, Supplementary Methods (PDF 646 kb)

Supplementary Software

PolyPhen-2 standalone software for Linux/Mac OS X (ZIP 414 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Adzhubei, I., Schmidt, S., Peshkin, L. et al. A method and server for predicting damaging missense mutations. Nat Methods 7, 248–249 (2010). https://doi.org/10.1038/nmeth0410-248

Download citation

Issue Date: April 2010
DOI: https://doi.org/10.1038/nmeth0410-248

This article is cited by

Whole-genome resequencing of Chinese indigenous sheep provides insight into the genetic basis underlying climate adaptation
- Meilin Jin
- Huihua Wang
- Caihong Wei
Genetics Selection Evolution (2024)
Advances in understanding the genetic architecture of antibody response to paratuberculosis in sheep by heritability estimate and LDLA mapping analyses and investigation of candidate regions using sequence-based data
- Mario Graziano Usai
- Sara Casu
- Antonello Carta
Genetics Selection Evolution (2024)
A novel missense COL9A3 variant in a pedigree with multiple lumbar disc herniation
- Lejian Jiang
- Chenhuan Wang
- Qingfeng Hu
Journal of Orthopaedic Surgery and Research (2024)
Mitochondrial point heteroplasmy: insights from deep-sequencing of human replicate samples
- Marina Korolija
- Viktorija Sukser
- Kristian Vlahoviček
BMC Genomics (2024)
Whole genome sequencing in clinical practice
- Frederik Otzen Bagger
- Line Borgwardt
- Finn Cilius Nielsen
BMC Medical Genomics (2024)

A method and server for predicting damaging missense mutations

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Text and Figures

Supplementary Software

Rights and permissions

About this article

Cite this article

This article is cited by

Whole-genome resequencing of Chinese indigenous sheep provides insight into the genetic basis underlying climate adaptation

Advances in understanding the genetic architecture of antibody response to paratuberculosis in sheep by heritability estimate and LDLA mapping analyses and investigation of candidate regions using sequence-based data

A novel missense COL9A3 variant in a pedigree with multiple lumbar disc herniation

Mitochondrial point heteroplasmy: insights from deep-sequencing of human replicate samples

Whole genome sequencing in clinical practice

Search

Quick links

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Text and Figures

Supplementary Software

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Whole-genome resequencing of Chinese indigenous sheep provides insight into the genetic basis underlying climate adaptation

Advances in understanding the genetic architecture of antibody response to paratuberculosis in sheep by heritability estimate and LDLA mapping analyses and investigation of candidate regions using sequence-based data

A novel missense COL9A3 variant in a pedigree with multiple lumbar disc herniation

Mitochondrial point heteroplasmy: insights from deep-sequencing of human replicate samples

Whole genome sequencing in clinical practice

Search

Quick links