A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment

Authors

  • Rupali Gill Assistant professor, School of Computer Sciences, School of Computer applications, Chitkara University, Punjab, India
  • Jaiteg Singh Associate professor, School of Computer Applications, CU, Punjab, India

DOI:

https://doi.org/10.15415/jotitt.2014.22012

Keywords:

Data inconsistency, identification of errors, organization growth, ETL, data quality

Abstract

In today’s scenario, Extraction–transformation– loading (ETL) tools have become important pieces of software responsible for integrating heterogeneous information from several sources. The task of carrying out the ETL process is potentially a complex, hard and time consuming. Organisations now –a-days are concerned about vast qualities of data. The data quality is concerned with technical issues in data warehouse environment. Research in last few decades has laid more stress on data quality issues in a data warehouse ETL process. The data quality can be ensured cleaning the data prior to loading the data into a warehouse. Since the data is collected from various sources, it comes in various formats. The standardization of formats and cleaning such data becomes the need of clean data warehouse environment. Data quality attributes like accuracy, correctness, consistency, timeliness are required for a Knowledge discovery process. The present state -of –the- art purpose of the research work is to deal on data quality issues at all the aforementioned stages of data warehousing 1) Data sources, 2) Data integration 3) Data staging, 4) Data warehouse modelling and schematic design and to formulate descriptive classification of these causes. The discovered knowledge is used to repair the data deficiencies. This work proposes a framework for quality of extraction transformation and loading of data into a warehouse.

Downloads

Download data is not yet available.

References

[1] Chinta Someswara Rao, J Rajanikanth, V Chandra Sekhar, Bhadri Raju MSVS (2012) “Data Cleaning Framework for Robust Data Quality in Enterprise Data Warehouse” , IJCST e- ISSN: 0976-8491 p. ISSN: 2229-4333, Vol. 3, Issue 3, pp. 36-41
[2] K. Srikanth, n.v.e.s Murthy, J. Anitha (2013) “ Data Warehousing Concept Using etl Process For SCD Type-3” International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Issn: 2276-6856, Vol. 2, Issue 5, pp. 142-145.
[3] Kabiri A.; Chiadmi D. (2013) “Survey on ETL Processes”, Journal of Theoretical and Applied Information Technology. Vol. 54, No. 2
[4] Pandey K.Rahul (2014). Data Quality in Data warehouse: problems and solution.IOSR-Journal of Computer Engineering, Volume 16, Issue 1, pp. 18-24.
[5] Rahm, E., Do, H.H. (2000). Data Cleaning: Problems and Current Approaches. IEE Data Engineering Bull. Vol 23 No. 4, pp. 3-13
[6] Rodi´c J.; Baranovi´c M. (2009) “Generating Data Quality Rules and Integration into ETL Process”, DOLAP’09 ACM
[7] Sakshi Agarwal ‘Reasons of Data Quality Problems in Data Warehousing’ International Journal of Computer, Information Technology & Bioinformatics (IJCITB) ISSN: 2278-7593, Volume-1, Issue-4 IEEE & IEEE Computational Intelligence Society, 2013.
[8] Saravanan p. (2014) “An Iterative Estimator for Predicting the Heterogeneous Data Sets”, Weekly Science Research Journal ISSN: 2321-7871, Volume 1, Issue 27, pp. 1-15.
[9] Satkaur; Mehta A.(2013) “a Review Paper on scope of etl in retail domain”, International Journal of Advanced Research in Computer Science and Software Engineering 3(5), ijarcsse, pp. 1209-1213.
[10] Singh J.; Singh K. (2009) “Statistically Analyzing the Impact of Automated ETL Testing on the Data Quality of a Data Warehouse”, International Journal of Computer and Electrical Engineering, Vol. 1, No. 4.
[11] Singh R.; Singh K. (2009). A Descriptive Classification of Causes of Data Quality Problems in Data Warehousing International Journal of Computer and Electrical Engineering, Vol. 1, No. 4
[12] Srikanth k.; Murthy n.v.e.s.; Anitha j. (2013) “Data Warehousing Concept Using etl Process for scd Type-2”, American Journal of Engineering Research (ajer) e-ISSN: 2320-0847 p-ISSN: 2320-0936, Volume-2, Issue-4, pp. 86-91’ 2013.
[13] Sujatha.R (2013) “Enhancing Iterative Non-Parametric Algorithm for Calculating Missing Values of Heterogeneous Datasets by Clustering”, International Journal of Scientific and Research Publication ISSN: 2250-3153, Volume 3, Issue 3, pp. 1-4.
[14] Vassiliadis P.; Simitsis A.; Baikousi E. (2009) “A Taxonomy of ETL Activities” DOLAP ‘09 Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP, pp 25-32.
[15] Heiko Muller, Johann-Christoph Freytag. (2003). Problems, Methods, and Challenges in Comprehensive Data Cleansing, pp. 21.
[16] Vassiliadis P.; Simitsis A.; Skiadopoulos S.(2002) “Conceptual Modeling for ETL Processes”, Proceedings of the ACM tenth international workshop on Data warehousing and OLAP, pp. 14-21.

Downloads

Published

2014-12-30

How to Cite

Rupali Gill, & Jaiteg Singh. (2014). A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment. Journal on Today’s Ideas - Tomorrow’s Technologies, 2(2), 153–160. https://doi.org/10.15415/jotitt.2014.22012

Issue

Section

Articles

Most read articles by the same author(s)