Computational and Data Sciences (MS) Theses

Development of Machine Learning Models for Generation and Activity Prediction of the Protein Tyrosine Kinase Inhibitors

Ryan Kassab, Chapman UniversityFollow

Date of Award

Summer 8-2022

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computational and Data Sciences

First Advisor

Gennady Verkhivker

Second Advisor

Mohamed Allali

Third Advisor

Cyril Rakovski

Abstract

The field of computational drug discovery and development continues to grow at a rapid pace, using generative machine learning approaches to present us with solutions to high dimensional and complex problems in drug discovery and design. In this work, we present a platform of Machine Learning based approaches for generation and scoring of novel kinase inhibitor molecules. We utilized a binary Random Forest classification model to develop a Machine Learning based scoring function to evaluate the generated molecules on Kinase Inhibition Likelihood. By training the model on several chemical features of each known kinase inhibitor, we were able to create a metric that captures the differences between a SRC Kinase Inhibitor and a non-SRC Kinase Inhibitor. We implemented the scoring function into a Biased and Unbiased Bayesian Optimization framework to generate molecules based on features of SRC Kinase Inhibitors. We then used similarity metrics such as Tanimoto Similarity to assess their closeness to that of known SRC Kinase Inhibitors. The molecules generated from this experiment demonstrated potential for belonging to the SRC Kinase Inhibitor family though chemical synthesis would be needed to confirm the results. The top molecules generated from the Unbiased and Biased Bayesian Optimization experiments were calculated to respectively have Tanimoto Similarity scores of 0.711 and 0.709 to known SRC Kinase Inhibitors. With calculated Kinase Inhibition Likelihood scores of 0.586 and 0.575, the top molecules generated from the Bayesian Optimization demonstrate a disconnect between the similarity scores to known SRC Kinase Inhibitors and the calculated Kinase Inhibition Likelihood score. It was found that implementing a bias into the Bayesian Optimization process had little effect on the quality of generated molecules. In addition, several molecules generated from the Bayesian Optimization process were sent to the School of Pharmacy for chemical synthesis which gives the experiment more concrete results. The results of this study demonstrated that generating molecules through
Bayesian Optimization techniques could aid in the generation of molecules for a specific kinase family, but further expansions of the techniques would be needed for substantial results.

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Recommended Citation

R. Kassab, "Development of machine learning models for generation and activity prediction of the protein tyrosine kinase inhibitors," M. S. thesis, Chapman University, Orange, CA, 2022. https://doi.org/10.36837/chapman.000391

Download

COinS

DOI

https://doi.org/10.36837/chapman.000391

Chapman University Digital Commons

Computational and Data Sciences (MS) Theses

Development of Machine Learning Models for Generation and Activity Prediction of the Protein Tyrosine Kinase Inhibitors

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Creative Commons License

Recommended Citation

DOI

Browse

Search

Author Corner

Links

Chapman University Digital Commons

Computational and Data Sciences (MS) Theses

Development of Machine Learning Models for Generation and Activity Prediction of the Protein Tyrosine Kinase Inhibitors

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Creative Commons License

Recommended Citation

Share

DOI

Browse

Search

Author Corner

Links