Mathematics, Physics, and Computer Science Faculty Articles and Research

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations

Steve Agajanian, Chapman University
Odeyemi Oluyemi, Chapman University
Gennady M. Verkhivker, Chapman UniversityFollow

Document Type

Article

Publication Date

6-11-2019

Abstract

Development of machine learning solutions for prediction of functional and clinical significance of cancer driver genes and mutations are paramount in modern biomedical research and have gained a significant momentum in a recent decade. In this work, we integrate different machine learning approaches, including tree based methods, random forest and gradient boosted tree (GBT) classifiers along with deep convolutional neural networks (CNN) for prediction of cancer driver mutations in the genomic datasets. The feasibility of CNN in using raw nucleotide sequences for classification of cancer driver mutations was initially explored by employing label encoding, one hot encoding, and embedding to preprocess the DNA information. These classifiers were benchmarked against their tree-based alternatives in order to evaluate the performance on a relative scale. We then integrated DNA-based scores generated by CNN with various categories of conservational, evolutionary and functional features into a generalized random forest classifier. The results of this study have demonstrated that CNN can learn high level features from genomic information that are complementary to the ensemble-based predictors often employed for classification of cancer mutations. By combining deep learning-generated score with only two main ensemble-based functional features, we can achieve a superior performance of various machine learning classifiers. Our findings have also suggested that synergy of nucleotide-based deep learning scores and integrated metrics derived from protein sequence conservation scores can allow for robust classification of cancer driver mutations with a limited number of highly informative features. Machine learning predictions are leveraged in molecular simulations, protein stability, and network-based analysis of cancer mutations in the protein kinase genes to obtain insights about molecular signatures of driver mutations and enhance the interpretability of cancer-specific classification models.

Comments

This article was originally published in Frontiers in Molecular Biosciences, volume 6, in 2019. DOI: 10.3389/fmolb.2019.00044

Recommended Citation

Agajanian S, Oluyemi O and Verkhivker GM (2019) Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations. Front. Mol. Biosci. 6:44. doi: 10.3389/fmolb.2019.00044

Peer Reviewed

Copyright

The authors

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Download

Included in

Cancer Biology Commons, Genetic Phenomena Commons, Genetic Processes Commons, Genetic Structures Commons, Medical Biochemistry Commons, Medicinal-Pharmaceutical Chemistry Commons, Nucleic Acids, Nucleotides, and Nucleosides Commons, Other Computer Sciences Commons, Other Forestry and Forest Sciences Commons

COinS

Chapman University Digital Commons

Mathematics, Physics, and Computer Science Faculty Articles and Research

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations

Document Type

Publication Date

Abstract

Comments

Recommended Citation

Peer Reviewed

Copyright

Creative Commons License

Included in

Browse

Search

Author Corner

Links

Chapman University Digital Commons

Mathematics, Physics, and Computer Science Faculty Articles and Research

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations

Authors

Document Type

Publication Date

Abstract

Comments

Recommended Citation

Peer Reviewed

Copyright

Creative Commons License

Included in

Share

Browse

Search

Author Corner

Links