Downloads
Software
fathmm-MKL
Instructions on how to install our MKL-based algorithm, capable of predicting the effects of both coding and non-coding variants using nucleotide-based HMMs, can be found on our fathmm-MKL GitHub repository
fathmm
Instructions on how to install our original algorithm, specifically designed for non-synonymous single nucleotide variants (nsSNVs), please visit our fathmm GitHub repository
License
Our software is licenced under the GNU General Public License (v3).
Back to Top ...
Datasets
Inherited Disease (weighted)
We use the Human Gene Mutation Database (HGMD) and SwissProt/TrEMBL to train our inherited disease model. We are therefore unable to circulate our
pathogenic training data (HGMD). However, we observe similar performance to those reported in our publication when using SwissProt/TrEMBL. The SwissProt/TrEMBL dataset used in our
inherited disease model (along with the associated pathogenic variants) can be found here.
Cancer
Our pathogenic training dataset (i.e. cancer-associated mutations) can be found here. Variation
data is recorded in the header of each sequence: variants starting with "cs" (cancer-associated variants) are those used in our training whereas those starting with "rs" (neutral polymorphisms) are ignored.
Note: The neutral dataset used in our cancer model is the same one used in our inherited disease model (see above).
Disease-Specific
SwissProt/TrEMBL (2014_05) variant data can be found here and the corresponding disease concepts used in our analysis can be
found here.
Note: The neutral dataset used in our disease-specific model is the same one used in our inherited disease model (see above).
If you use these datasets for your analysis, please cite the following publication:
Shihab HA, Gough J, Mort M, Cooper DN, Day INM, Gaunt, TR. (2014). Ranking Non-Synonymous Single Nucleotide Polymorphisms based on Disease Concepts. Human Genomics, 8:11
Back to Top ...