Document Type

Article

Publication Date

8-18-2021

Abstract

Principal Component Analysis (PCA) is a commonly used technique that uses the correlation structure of the original variables to reduce the dimensionality of the data. This reduction is achieved by considering only the first few principal components for a subsequent analysis. The usual inclusion criterion is defined by the proportion of the total variance of the principal components exceeding a predetermined threshold. We show that in certain classification problems, even extremely high inclusion threshold can negatively impact the classification accuracy. The omission of small variance principal components can severely diminish the performance of the models. We noticed this phenomenon in classification analyses using high dimension ECG data where the most common classification methods lost between 1 and 6% of accuracy even when using 99% inclusion threshold. However, this issue can even occur in low dimension data with simple correlation structure as our numerical example shows. We conclude that the exclusion of any principal components should be carefully investigated.

Comments

This article was originally published in Data Science Journal, volume 20, issue 1, in 2021. http://doi.org/10.5334/dsj-2021-026

Recommended Citation

Zheng, J. and Rakovski, C., 2021. On the Application of Principal Component Analysis to Classification Problems. Data Science Journal, 20(1), p.26. http://doi.org/10.5334/dsj-2021-026

Peer Reviewed

Copyright

The authors

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Download

Included in

Numerical Analysis and Computation Commons, Other Mathematics Commons

COinS

Chapman University Digital Commons

Mathematics, Physics, and Computer Science Faculty Articles and Research

On the Application of Principal Component Analysis to Classification Problems

Document Type

Publication Date

Abstract

Comments

Recommended Citation

Peer Reviewed

Copyright

Creative Commons License

Included in

Browse

Search

Author Corner

Links

Chapman University Digital Commons

Mathematics, Physics, and Computer Science Faculty Articles and Research

On the Application of Principal Component Analysis to Classification Problems

Authors

Document Type

Publication Date

Abstract

Comments

Recommended Citation

Peer Reviewed

Copyright

Creative Commons License

Included in

Share

Browse

Search

Author Corner

Links