Date of Award
Spring 5-28-2019
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computational and Data Sciences
First Advisor
Erik Linstead, Ph.D.
Second Advisor
Elizabeth Stevens, Ph.D.
Third Advisor
Samuel E. Ehrenreich, Ph.D.
Abstract
Techniques based on artificial neural networks represent the current state-of-the-art in machine learning due to the availability of improved hardware and large data sets. Here we employ doc2vec, an unsupervised neural network, to capture the semantic content of text messages sent by adolescents during high school, and encode this semantic content as numeric vectors. These vectors effectively condense the text message data into highly leverageable inputs to a logistic regression classifier in a matter of hours, as compared to the tedious and often quite lengthy task of manually coding data. Using our machine learning approach, we are able to train a logistic regression model to predict adolescents' engagement in substance abuse during distinct life phases with accuracy ranging from 76.5% to 88.1%. We show the effects of grade level and text message aggregation strategy on the efficacy of document embedding generation with doc2vec. Additional examination of the vectorizations for specific terms extracted from the text message data adds quantitative depth to this analysis. We demonstrate the ability of the method used herein to overcome traditional natural language processing concerns related to unconventional orthography. These results suggest that the approach described in this thesis is a competitive and efficient alternative to existing methodologies for predicting substance abuse behaviors. This work reveals the potential for the application of machine learning-based manipulation of text messaging data to development of automatic intervention strategies against substance abuse and other adolescent challenges.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Recommended Citation
A. Bergh, "A machine learning approach to predicting alcohol consumption in adolescents from historical text messaging data," M. S. thesis, Chapman University, Orange, CA, 2019. https://doi.org/10.36837/chapman.000072