Computational and Data Sciences (PhD) Dissertations

Assessing the Re-Identification Risk in ECG Datasets and an Application of Privacy Preserving Techniques in ECG Analysis

Arin Ghazarian, Chapman UniversityFollow

Date of Award

Spring 5-2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computational and Data Sciences

First Advisor

Cyril Rakovski

Second Advisor

Daniel Alpay

Third Advisor

Anthony Chang

Abstract

In this work, first we investigate the use of ECG signal as a biometric in human identification systems using deep learning models. We train convolutional neural network models on ECG samples from approximately 81k patients. Our models achieved an over-all accuracy of 95.69%. Further, we assess the accuracy of our ECG identification model for distinct groups of patients with particular heart conditions and combinations of such conditions. For example, we observed that the identification accuracy was the highest (99.7%) for patients with both ST changes and supraventricular tachycardia. On the other hand, we also found that the identification rate was the lowest for patients diagnosed with both atrial fibrillation and complete right bundle branch block (49%).

Next, we discuss the implications of our findings from the ECG identification models regarding the re-identification risks for the patients and how seemingly anonymized ECG datasets can cause privacy leakages. For some hypothetical scenarios such as when a patient contributes to two different research datasets, we try to quantify the privacy risks. We estimate the probability of how uniquely and accurately one can re-identify patients with a specific type of heart condition contributing to multiple ECG datasets containing data fields like age, gender, and location. We also discuss the new ECG-based demographics detection technology and how it might compromise patients’ privacy even to a degree where someone can find a patient’s residence solely based on an ECG sample. The implications of our findings for the privacy regulations such as HIPAA or GDPR are discussed as well.

In contrast to common traditional belief that statistical aggregate or anonymized databases are safe to share, it can be proven that even aggregation does not guarantee privacy and individuals can be re-identified from the published aggregated results. Differential privacy is a privacy preserving data analysis technique which protects the privacy of individuals’ in a database by adding the right amount of noise to perturbate the results. In the last chapter, we will discuss an end-to-end application of differential privacy to an ECG dataset in order to safely share useful statistics with the public.

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Recommended Citation

A. Ghazarian, "Assessing the re-identification risk in ECG datasets and an application of privacy preserving techniques in ECG analysis," Ph.D. dissertation, Chapman University, Orange, CA, 2021. https://doi.org/10.36837/chapman.000276

Download

Included in

Data Science Commons

COinS

DOI

https://doi.org/10.36837/chapman.000276

Chapman University Digital Commons

Computational and Data Sciences (PhD) Dissertations

Assessing the Re-Identification Risk in ECG Datasets and an Application of Privacy Preserving Techniques in ECG Analysis

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Creative Commons License

Recommended Citation

Included in

DOI

Browse

Search

Author Corner

Links

Chapman University Digital Commons

Computational and Data Sciences (PhD) Dissertations

Assessing the Re-Identification Risk in ECG Datasets and an Application of Privacy Preserving Techniques in ECG Analysis

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Creative Commons License

Recommended Citation

Included in

Share

DOI

Browse

Search

Author Corner

Links