Computational and Data Sciences (PhD) Dissertations

Advances in NLP Algorithms on Unstructured Medical Notes Data and Approaches to Handling Class Imbalance Issues

Hanna Lu, Chapman UniversityFollow

Date of Award

Fall 12-2022

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computational and Data Sciences

First Advisor

Cyril Rakovski

Second Advisor

Daniel Alpay

Third Advisor

Chun Hsien Chiang

Fourth Advisor

Alex Barrett

Abstract

This dissertation is composed of three research projects, each addressing one aspect of the text classification tasks on unstructured medical notes.

The first study investigated the model performance of sequence deep learning models that are widely used in NLP tasks such as RNN, GRU, LSTM, Bi-LSTM, as well as CNN and the novel and more advanced attention-based algorithms such as the Transformer Encoder and BERT-Base. The model performances of these algorithms were evaluated with and without pre-trained word embeddings. The Transformer Encoder model stood out as the best model for all tasks and the CNN model produced comparable performance when the classes were relatively balanced.

As an extension of the first study, the second study explored the effects of 20 text data augmentation methods on the same data to handle the issues of class imbalance and small sample size. In addition, the effects of different strategies in terms of the amount of augmentation were also investigated. The results showed that the Splitting Augmenter consistently improved the model performance in all strategies for most tasks, and the largest improvement was 0.13 in F1 score and an impressive 0.34 in AUC-ROC. For highly imbalanced tasks, the strategy that augments the minority class until balanced improved model performance by the largest margin. For other tasks, the best-performing strategy was the one that augments the minority class until balanced and then augments both classes by an additional 10%.

The third study was carried out to predict suicidal or self-injurious events to help improve the efficiency of triage for health care services and prevent suicidal and injurious events from happening in the Orange County Jails. This study showed that the medical and mental health progress notes data contain more information about the inmates’ mental health state pertaining to their suicidal or self-injurious tendency than the structured data available in the database. Two different ways of incorporating the information from the notes data in the model building were introduced, and under-sampling was used to effectively mitigate the impact of extremely imbalanced classes.

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Recommended Citation

H. Lu, "Advances in NLP algorithms on unstructured medical notes data and approaches to handling class imbalance issues," Ph.D. dissertation, Chapman University, Orange, CA, 2022. https://doi.org/10.36837/chapman.000414

Download

Included in

Community Health and Preventive Medicine Commons, Diagnosis Commons, Health Services Research Commons

COinS

DOI

https://doi.org/10.36837/chapman.000414

Chapman University Digital Commons

Computational and Data Sciences (PhD) Dissertations

Advances in NLP Algorithms on Unstructured Medical Notes Data and Approaches to Handling Class Imbalance Issues

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Creative Commons License

Recommended Citation

Included in

DOI

Browse

Search

Author Corner

Links

Chapman University Digital Commons

Computational and Data Sciences (PhD) Dissertations

Advances in NLP Algorithms on Unstructured Medical Notes Data and Approaches to Handling Class Imbalance Issues

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

Creative Commons License

Recommended Citation

Included in

Share

DOI

Browse

Search

Author Corner

Links