fathmm

Software
Datasets

Software

fathmm-MKL

Instructions on how to install our MKL-based algorithm, capable of predicting the effects of both coding and non-coding variants using nucleotide-based HMMs, can be found on our fathmm-MKL GitHub repository

Instructions on how to install our original algorithm, specifically designed for non-synonymous single nucleotide variants (nsSNVs), please visit our fathmm GitHub repository

License

Our software is licenced under the GNU General Public License (v3).

Datasets

Inherited Disease (weighted)

We use the Human Gene Mutation Database (HGMD) and SwissProt/TrEMBL to train our inherited disease model. We are therefore unable to circulate our pathogenic training data (HGMD). However, we observe similar performance to those reported in our publication when using SwissProt/TrEMBL. The SwissProt/TrEMBL dataset used in our inherited disease model (along with the associated pathogenic variants) can be found here.

Cancer

Our pathogenic training dataset (i.e. cancer-associated mutations) can be found here. Variation data is recorded in the header of each sequence: variants starting with "cs" (cancer-associated variants) are those used in our training whereas those starting with "rs" (neutral polymorphisms) are ignored.

Note: The neutral dataset used in our cancer model is the same one used in our inherited disease model (see above).

Disease-Specific

SwissProt/TrEMBL (2014_05) variant data can be found here and the corresponding disease concepts used in our analysis can be found here.

Note: The neutral dataset used in our disease-specific model is the same one used in our inherited disease model (see above).

If you use these datasets for your analysis, please cite the following publication:

Downloads