Date of Award
Summer 8-2021
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computational and Data Sciences
First Advisor
Dr. O. Maduka Ogba
Second Advisor
Dr. Gennady Verkhivker
Third Advisor
Dr. Lindsay Waldrop
Abstract
Computational investigation of molecular structures and reactions of biological and pharmaceutical interests remains a grand scientific challenge due to the size and conformational flexibility of these systems. The work requires parsing and analyzing thousands of conformations in each molecular state for meaningful chemical information and subjecting the ensemble to costly quantum chemical calculations. The current status quo typically involves a manual process where the investigator must look at each conformation, separating each into structural families. This process is time-intensive and tedious, making this process infeasible in some cases, and limiting the ability of theoreticians to study these systems. However, the use of computational software allows for the necessary exhaustive investigation without the bottlenecks of a brute force approach to each flexible system.
I aim to create the solution to this problem. In my thesis project, I seek to develop a Python software that will (i) automate the parsing of each conformation within a conformational ensemble, (ii) use principal component analysis (PCA) and clustering to find and investigate conformational families within the ensemble, (iii) separate and visualize conformational families in a user-friendly manner, and (iv) convey to the user how conformational families were delineated by way of features found within data. Results explored this work show that the program has the ability to separate conformational families with varying ranges of difficulty.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Recommended Citation
M. Nwerem, "Automated parsing of flexible molecular systems using principal component analysis and K-means clustering techniques," M. S. thesis, Chapman University, Orange, CA, 2021. https://doi.org/10.36837/chapman.000293