A comprehensive review on data preprocessing techniques in data analysis


Çetin V., YILDIZ O.

PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, vol.28, no.2, pp.299-312, 2022 (ESCI) identifier

Abstract

With the technological developments, the amount of data stored in the computer environment is increasing very rapidly. Data analysis has become an important research subject for the correct evaluation of these data and to transform them into useful information. Of course, data play an important role in data analysis. However, model performance is highly dependent on the characteristics of the data. For this reason, it is essential to preprocess them before starting any data analysis process. Data preprocessing creates accurate and useful datasets by overcoming erroneous, incomplete, or other unwanted problems. In this study, papers on data preprocessing in the last 5 years have been researched systematically and it has been observed that widely used preprocessing methods are classified under three main branches: data cleaning, data transformation and data reduction. These methods and various algorithms of them are examined, the frequency of use is presented, and comparisons are made in terms of accuracy performance. As the result of the study shows, when data preprocessing methods are not used on raw data or when wrong data preprocessing methods are applied, data analysis methods alone cannot achieve sufficient performance.