Mathematics, Physics, and Computer Science Faculty Articles and Research

A Comparative Study on Deep Learning Models for Text Classification of Unstructured Medical Notes with Various Levels of Class Imbalance

Hongxia Lu, Chapman University
Louis Ehwerhemuepha, Children’s Hospital of Orange County
Cyril Rakovski, Chapman UniversityFollow

Document Type

Article

Publication Date

7-2-2022

Abstract

Background

Discharge medical notes written by physicians contain important information about the health condition of patients. Many deep learning algorithms have been successfully applied to extract important information from unstructured medical notes data that can entail subsequent actionable results in the medical domain. This study aims to explore the model performance of various deep learning algorithms in text classification tasks on medical notes with respect to different disease class imbalance scenarios.

Methods

In this study, we employed seven artificial intelligence models, a CNN (Convolutional Neural Network), a Transformer encoder, a pretrained BERT (Bidirectional Encoder Representations from Transformers), and four typical sequence neural networks models, namely, RNN (Recurrent Neural Network), GRU (Gated Recurrent Unit), LSTM (Long Short-Term Memory), and Bi-LSTM (Bi-directional Long Short-Term Memory) to classify the presence or absence of 16 disease conditions from patients’ discharge summary notes. We analyzed this question as a composition of 16 binary separate classification problems. The model performance of the seven models on each of the 16 datasets with various levels of imbalance between classes were compared in terms of AUC-ROC (Area Under the Curve of the Receiver Operating Characteristic), AUC-PR (Area Under the Curve of Precision and Recall), F1 Score, and Balanced Accuracy as well as the training time. The model performances were also compared in combination with different word embedding approaches (GloVe, BioWordVec, and no pre-trained word embeddings).

Results

The analyses of these 16 binary classification problems showed that the Transformer encoder model performs the best in nearly all scenarios. In addition, when the disease prevalence is close to or greater than 50%, the Convolutional Neural Network model achieved a comparable performance to the Transformer encoder, and its training time was 17.6% shorter than the second fastest model, 91.3% shorter than the Transformer encoder, and 94.7% shorter than the pre-trained BERT-Base model. The BioWordVec embeddings slightly improved the performance of the Bi-LSTM model in most disease prevalence scenarios, while the CNN model performed better without pre-trained word embeddings. In addition, the training time was significantly reduced with the GloVe embeddings for all models.

Conclusions

For classification tasks on medical notes, Transformer encoders are the best choice if the computation resource is not an issue. Otherwise, when the classes are relatively balanced, CNNs are a leading candidate because of their competitive performance and computational efficiency.

Comments

This article was originally published in BMC Medical Research Methodology, volume 22, in 2022. https://doi.org/10.1186/s12874-022-01665-y

Recommended Citation

Lu, H., Ehwerhemuepha, L. & Rakovski, C. A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance. BMC Med Res Methodol 22, 181 (2022). https://doi.org/10.1186/s12874-022-01665-y

12874_2022_1665_MOESM1_ESM.docx (142 kB)

Peer Reviewed

Copyright

The authors

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Download

Included in

Artificial Intelligence and Robotics Commons, Data Science Commons, Other Medicine and Health Sciences Commons

COinS

Chapman University Digital Commons

Mathematics, Physics, and Computer Science Faculty Articles and Research

A Comparative Study on Deep Learning Models for Text Classification of Unstructured Medical Notes with Various Levels of Class Imbalance

Document Type

Publication Date

Abstract

Background

Methods

Results

Conclusions

Comments

Recommended Citation

Peer Reviewed

Copyright

Creative Commons License

Included in

Browse

Search

Author Corner

Links

Chapman University Digital Commons

Mathematics, Physics, and Computer Science Faculty Articles and Research

A Comparative Study on Deep Learning Models for Text Classification of Unstructured Medical Notes with Various Levels of Class Imbalance

Authors

Document Type

Publication Date

Abstract

Background

Methods

Results

Conclusions

Comments

Recommended Citation

Peer Reviewed

Copyright

Creative Commons License

Included in

Share

Browse

Search

Author Corner

Links