A Study on the impact of Data Preprocessing in Traditional Machine Learning Pipeline using Column Transformer
DOI:
.Keywords:
Data Preprocessing, Column Transformer, MinMaxScaler, OneHotEncoder, StandardScalar, Logistic Regression, Support Vector Machine
Abstract
Data Preprocessing is a crucial step that has a huge impact on the overall result of the model irrespective of the algorithm. There is limited exploration of how preprocessing techniques specifically impact the performance of models across a variety of preprocessing approaches. The challenge is in identifying which preprocessing methods best improve the models' performance on real-world datasets, such as the Adult Income dataset. This study aims to explore the volume of impact each preprocessing model has when it is combined with one another. StandardScaler, MinMaxScaler, and OneHotEncoder are associated with Logistic Regression and Support Vector Machine to determine its effectiveness. Each Preprocessing method has its own influence on the outcome of the results. The goal is to compare the methods and observe which preprocessing techniques enhance the performance of machine learning models and under which circumstances these techniques are most effective.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


