Our software and server accepts one of the following formats (see here for annotating VCF files):
<protein> <substitution>
dbSNP rs identifiers
<protein>
is the protein identifier and <substitution>
is the amino acid substitution in the conventional one
letter format. Multiple substitutions can be entered on a single line and should be separated by a comma. Our server accepts SwissProt/TrEMBL, RefSeq and Ensembl protein identifiers, e.g.:
P43026 L441P ENSP00000325527 N548I,E1073K,C2307SBut note that the sequences FATHMM uses (accessible here) may now be out of date.
As described in our paper, our software is comprised of two algorithms: one sequence/conservation based (unweighted) and the other
combines sequence conservation with pathogenicity weights (weighted). In short, our weighted algorithm is capable of adjusting
our conservation-based predictions to account for the tolerance of related sequences to mutations. For example, mutations falling within
diverse regions of the Cellular Tumor Antigen P53 can be up-weighted according to the critical role the protein plays in cell regulation.
In contrast, mutations falling within conserved regions of the MHC Antigen-Regognition Domain can be down-weighted according the
hypervariable nature of the domain.
For more information on our coding predictions, please refer to the following publication:
Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GLA, Edwards KJ, Day INM, Gaunt, TR. (2013). Predicting the Functional, Molecular and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models. Hum. Mutat., 34:57-65
Our software not only predicts the potentially deleterious nature of protein variants but it is also capable of annotating
the molecular and phenotypic consequences of these mutations. Here, the molecular consequences of mutations are statistically
inferred by mapping SUPERFAMILY domains onto the Gene Ontology, the
Human Phenotype Ontology and the Mammalian Phenotype Ontology (and more).
For more information on these mappings, please refer to the following publications:
Fang H, Gough J. (2012). dcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more. Nucleic Acids Res., 41, D536-544.
Gough J, Karplus K, Hughey R, Chothia C. (2001). Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol., 313, 903-919.