Publication:

The following paper describes the algorithm in more detail:

Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day INM, Gaunt TR, Campbell C (2014). An Integrative Approach to Predicting the Functional Consequences of Non-coding and Coding Sequence Variation. Bioinformatics 2015 May 15;31(10):1536-43.

Download supplementary file

Input Format:

Our software accepts comma-separated mutation data in the following format:

Chromosome
Position
Reference Base
Mutant Base

Note that FATHMM-MKL predictions are based on the GRCh37/hg19 genome build.

For example:

1,916549,A,G
1,935222,C,A
1,11854785,C,T
1,11854786,C,T

Note: 'Chr' is not required when defining the chromosome above (e.g. Chr1) and all our predictions are derived using the forward strand.

Prediction Interpretation:

Predictions are given as p-values in the range [0, 1]: values above 0.5 are predicted to be deleterious, while those below 0.5 are predicted to be neutral or benign. P-values close to the extremes (0 or 1) are the highest-confidence predictions that yield the highest accuracy.

Feature groups (letters A-J) are described in the Supplementary detail of the main paper, and summarised in section 2.1 of the paper.

We use distinct predictors for positions either in coding regions (positions within coding-sequence exons) and non-coding regions (positions in intergenic regions, introns or non-coding genes). The coding predictor is based on 10 groups of features, labeled A-J; the non-coding predictor uses a subset of 4 of these feature groups, A-D (see our related publication for details on the groups and their sources).

Annotations are not yet available in all feature groups for all genomic positions. To produce a p-value for these positions, we adjust our weights relative to the features that are available. For example, if our weights for A-D were 0.5, 0.1, 0.1 and 0.3, respectively, and there were no annotations for group A, then the missing weight would be distributed proportionally across remaining weights, which would become 0.2, 0.2 and 0.6. This allows us to make predictions for any combination of feature groups while yielding p-values in the [0,1] range.

Note that predictions based only on a subset of features may not be as accurate as those based on complete feature sets. In particular, predictions that are missing the conservation score features (groups A and E) will tend to be less accurate than other predictions. To aid in interpreting these predictions, we provide a list of the feature groups that contributed to each prediction.

Publications that use these data should cite the following publications:

Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day INM, Gaunt TR, Campbell C (2014). An Integrative Approach to Predicting the Functional Consequences of Non-coding and Coding Sequence Variation. Bioinformatics 2015 May 15;31(10):1536-43.

Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GLA, Edwards KJ, Day INM, Gaunt, TR. (2013). Predicting the Functional, Molecular and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models. Hum. Mutat., 34:57-65

Download:

Instructions on how to install our MKL-based algorithm, capable of predicting the effects of both coding and non-coding variants using nucleotide-based HMMs, can be found on our fathmm-MKL GitHub repository

Predict the Functional Consequences of Non-Coding and Coding Single Nucleotide Variants (SNVs)

Enter Your Mutations:

Publication:

Input Format:

Prediction Interpretation:

Download: