Date of Award

Summer 8-2021

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computational and Data Sciences

First Advisor

Erik Linstead

Second Advisor

Elizabeth Stevens

Third Advisor

Elizabeth Davison

Abstract

Advancements in genetic sequencing methods for microbiomes in recent decades have permitted the collection of taxonomic and functional profiles of microbial communities, accelerating the discovery of the functional aspects of the microbiome and generating an increased interest among clinicians in applying these techniques with patients. This advancement has coincided with software and hardware improvements in the field of machine learning and deep learning. Combined, these advancements implicate further potential for progress in disease diagnosis and treatment in humans. The ability to classify a human microbiome profile into a disease category, and additionally identify the differentiating factors within the profile between diseased and healthy individuals are valuable missions for both disease diagnosis and understanding the pathology. This can be particularly important in diseases with unknown etiology, providing potential to develop and offer accurate diagnostic tools to clinicians who currently diagnose based on the limited research available or as a diagnosis of exclusion. Human microbiome studies like the Human Microbiome Project generate data that can help produce important findings related to health care and disease diagnosis and treatment. The nature of this data produces a large feature space relative to the number of samples and high sparsity, which can make it challenging to use in machine learning models, especially when the number of samples is small and much smaller than the number of features. Here, the IBD microbiome profiles VIII from the Human Microbiome Project are used to classify disease. We show the use of dimensionality reduction and variational autoencoders (VAE) in generating synthetic microbiome profiles as a potential method to deal with this issue and increase existing disease classification model performance. Results are compared across various baseline machine learning models with traditional supervised and unsupervised dimensionality reduction techniques. We show that using a dataset supplemented with VAE-generated artificial microbiome data improves classification results for small datasets with large feature space compared to sample size, and highly imbalanced class sizes, and may be used as a method to increase classification accuracy in microbiome-based diagnostic tools.

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Recommended Citation

C. Manughian-Peter, "Enhancing microbiome host disease prediction with variational autoencoders," M. S. thesis, Chapman University, Orange, CA, 2021. https://doi.org/10.36837/chapman.000297

Download

Included in

Data Science Commons, Digestive System Diseases Commons, Other Computer Sciences Commons

COinS

DOI

https://doi.org/10.36837/chapman.000297

Chapman University Digital Commons

Computational and Data Sciences (MS) Theses

Enhancing Microbiome Host Disease Prediction with Variational Autoencoders

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Creative Commons License

Recommended Citation

Included in

DOI

Browse

Search

Author Corner

Links

Chapman University Digital Commons

Computational and Data Sciences (MS) Theses

Enhancing Microbiome Host Disease Prediction with Variational Autoencoders

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Creative Commons License

Recommended Citation

Included in

Share

DOI

Browse

Search

Author Corner

Links