If you use the data on this website, please cite the following publication:
Rogers MF, Shihab HA, Mort M, Cooper DN, Gaunt TR, Campbell C. FATHMM-XF: enhanced accuracy in the prediction of pathogenic sequence variants via an extended feature set, Bioinformatics, September 2017.
Our web form accepts comma-separated mutation data in the following format:
Chromosome
Position
Reference Base
Mutant Base
Note that FATHMM-XF predictions are based on the GRCh37/hg19 genome build.
11,247320,G,T 11,614524,G,C 11,703018,C,A 11,902253,C,T 12,6939178,G,C 12,9309954,G,C 12,15730463,T,C 12,21565627,T,C 12,48135388,G,A 12,49054144,C,A 12,56510998,A,C 12,96778306,T,C 12,96854510,T,A 12,96900701,T,G 12,113435561,G,C 12,119627522,G,C 13,19751346,T,G 18,321728,C,A 18,3814080,A,G 18,3814231,C,G 18,3879547,C,T 19,984557,G,A 19,1042353,C,T 19,5739354,T,G 19,6495520,C,G 21,33867336,G,C 21,46217346,A,G 22,17881712,T,A 22,20976770,A,C 22,34160210,A,C 22,35120191,C,T 22,38024684,G,C 22,50704735,C,A 6,7585420,T,G 9,2639891,T,C 9,4117959,A,G 9,5304109,A,C 9,6012911,G,A 9,6554754,C,T 9,8518267,G,T
Note: 'Chr' should be omitted when specifying the chromosome above (e.g. '1', not 'Chr1'). All predictions are derived using the forward strand.
Chromosome
Position
Identifier
Reference Base
Mutant Base
1 20915172 . C T 2 48025976 . G T 4 80977297 . T A 5 1293898 . G A 6 51713769 . C T 9 79852917 . G C 11 1094690 . C T 11 14992735 . C G
The VCF format specification
requires eight columns, but here only
the chromosome, position, reference and mutant bases are used and reported.
Back to Top ...
Predictions are given as p-values in the range [0, 1]:
values above 0.5 are predicted to be deleterious, while those below
0.5 are predicted to be neutral or benign.
P-values close to the extremes (0 or 1) are the highest-confidence predictions
that yield the highest accuracy.
We use distinct predictors for positions either in coding regions (positions
within coding-sequence exons) or non-coding regions (positions in intergenic
regions, introns or non-coding genes). The coding predictor is based on
six groups of features representing sequence conservation, nucleotide sequence
characteristics, genomic features (codons, splice sites, etc.), amino acid features
and expression levels in different tissues.
The non-coding predictor uses five feature groups that encompass nearly the same
kinds of data, the primary exception being evidence for open chromatin.
Publications that use these data should cite the following publication:
Rogers MF, Shihab HA, Mort M, Cooper DN, Gaunt TR, Campbell C. FATHMM-XF:
enhanced accuracy in the prediction of pathogenic sequence variants via an extended feature set.
(journal submission)
Because this work is related to FATHMM and FATHMM-MKL, publications that use these data
may also wish to cite the following:
Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day INM, Gaunt TR, Campbell C (2015). An Integrative Approach to Predicting the Functional Consequences of Non-coding and Coding Sequence Variation. Bioinformatics 2015 May 15;31(10):1536-43.
Shihab HA, Gough J, Cooper DN, Stenson PD, Barker GLA, Edwards KJ, Day INM, Gaunt, TR. (2013). Predicting the Functional, Molecular and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models. Hum. Mutat., 34:57-65
Usage: fathmm_xf_query.py query-file [options] Predict the pathogenic potential of single nucleotide variants (SNVs). The query file must be a list of queries in VCF format. Note: the id column and columns beyond the first five are ignored. chromosome <tab> position <tab> id <tab> reference <tab> mutant <tab> ... Example: 1 69094 . G A 11 168961 . T A 18 119888 . G A Options: -h, --help show this help message and exit -c CDB CScape coding database [default: fathmm_xf_coding.vcf.gz] -n NDB CScape noncoding database [default: fathmm_xf_noncoding.vcf.gz] -o OUTPUT Output file [default: stdout] -v Verbose mode [default: False]Back to Top ...