FATHMM-XF (FATHMM with eXtended Features) represents a substantial improvement over our earlier predictor, FATHMM-MKL. By using an extended set of feature groups and by exploring an expanded set of possible models, the new method yields even greater accuracy than its predecessor on independent test sets. As with FATHMM-MKL, FATHMM-XF predicts whether single nucleotide variants (SNVs) in the human genome are likely to be functional or non-functional in inherited diseases. Also like its predecessor, it uses distinct models for coding and non-coding regions, to improve overall accuracy. Unlike FATHMM-MKL, FATHMM-XF models are build up on single-kernel datasets. The models may then learn interactions between data sources that help to boost its accuracy in all regions of the genome.
Our paper describes the algorithm in detail; links will be available upon publication.
Rogers MF, Shihab HA, Mort M, Cooper DN, Gaunt TR, Campbell C. FATHMM-XF: enhanced accuracy in the prediction of pathogenic sequence variants via an extended feature set. (journal submission)
Predictions are given as p-values in the range [0, 1]:
values above 0.5 are predicted to be deleterious, while those below
0.5 are predicted to be neutral or benign.
P-values close to the extremes (0 or 1) are the highest-confidence predictions
that yield the highest accuracy.
We use distinct predictors for positions either in coding regions (positions
within coding-sequence exons) or non-coding regions (positions in intergenic
regions, introns or non-coding genes). The coding predictor is based on
six groups of features representing sequence conservation, nucleotide sequence
characteristics, genomic features (codons, splice sites, etc.), amino acid features
and expression levels in different tissues.
The non-coding predictor uses five feature groups that encompass nearly the same
kinds of data, the primary exception being evidence for open chromatin.
Because this work is related to FATHMM and FATHMM-MKL, publications that use these data
should cite the following publications:
Rogers MF, Shihab HA, Mort M, Cooper DN, Gaunt TR, Campbell C. FATHMM-XF:
enhanced accuracy in the prediction of pathogenic sequence variants via an extended feature set.
(journal submission)
Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day INM, Gaunt TR, Campbell C (2014). An Integrative Approach to Predicting the Functional Consequences of Non-coding and Coding Sequence Variation. Bioinformatics 2015 May 15;31(10):1536-43.
Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GLA, Edwards KJ, Day INM, Gaunt, TR. (2013). Predicting the Functional, Molecular and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models. Hum. Mutat., 34:57-65