Statistically Identifying Tumor Suppressors and Oncogenes from Pan-Cancer Genome Sequencing Data.

Bioinformatics. 2015 Jul 25. pii: btv430. [Epub ahead of print]


MOTIVATION: Several tools exist to identify cancer driver genes based on somatic mutation data. However, these tools do not account for subclasses of cancer genes: oncogenes, which undergo gain-of-function events, and tumor suppressor genes (TSG) which undergo loss-of-function. A method which accounts for these subclasses could improve performance while also suggesting a mechanism of action for new putative cancer genes. RESULTS: We develop a panel of five complementary statistical tests and assess their performance against a curated set of 99 high confidence cancer genes (HiConf) using a pan-cancer dataset of 1.7 million mutations. We identify patient bias as a novel signal for cancer gene discovery, and use it to significantly improve detection of oncogenes over existing methods (AUROC=0.894). Additionally, our test of truncation event rate separates oncogenes and TSGs from one another (AUROC=0.922). Finally, a random forest integrating the five tests further improves performance and identifies new cancer genes, including CACNG3, HDAC2, HIST1H1E, NXF1, GPS2 and HLA-DRB1. AVAILABILITY: All mutation data, instructions, functions for computing the statistics and integrating them, as well as the HiConf gene panel, are available at {{}}.


Kumar RD, Searleman AC, Swamidass SJ, Griffith OL, Bose R.

Institute Authors

Obi Griffith, Ph.D.