Document Type

Article

Publication Date

5-29-2024

Abstract

The exponential growth of data coupled with the widespread application of artificial intelligence(AI) presents organizations with challenges in upholding data accuracy, especially within data engineering functions. While the Extraction, Transformation, and Loading process addresses error-free data ingestion, validating the content within data streams remains a challenge. Prompt detection and remediation of data issues are crucial, especially in automated analytical environments driven by AI. To address these issues, this study focuses on detecting drifts in data distributions and divergence within data fields processed from different sample populations. Using a hypothetical banking scenario, we illustrate the impact of data drift on automated decision-making processes. We propose a scalable method leveraging the Kullback-Leibler (KL) divergence measure, specifically the Population Stability Index (PSI), to detect and quantify data drift. Through comprehensive simulations, we demonstrate the effectiveness of PSI in identifying and mitigating data drift issues. This study contributes to enhancing data engineering functions in organizations by offering a scalable solution for early drift detection in data ingestion pipelines. We discuss related research works, identify gaps, and present the methodology and experiment results, underscoring the importance of robust data governance practices in mitigating risks associated with data drift and improving data observability.

Comments

This article was originally published in Journal of Data, Information and Management in 2024. https://doi.org/10.1007/s42488-024-00119-y

Copyright

The authors

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.