Analyzing the Impact of Feature Selection Techniques on the Predictive Performance of Supervised Machine Learning Models

Carmen Richard; Reymon Charles

Analyzing the Impact of Feature Selection Techniques on the Predictive Performance of Supervised Machine Learning Models

Authors

Carmen Richard

Business Intelligence Developer, Dominica.

Reymon Charles

Machine Learning Engineer, Dominica.

Keywords:

Feature Selection, Supervised Learning, Predictive Performance, Dimensionality Reduction, Machine Learning

Synopsis

Feature selection plays a pivotal role in enhancing the predictive performance of supervised machine learning models by reducing dimensionality, eliminating noise, and improving generalization. This study evaluates the effects of four feature selection techniques—Filter (Variance Threshold), Wrapper (Recursive Feature Elimination), Embedded (LASSO), and Tree-Based Selection—on three classification models: Logistic Regression, Support Vector Machine (SVM), and Random Forest. Using the UCI Wine dataset as a sample dataset, results demonstrate that feature selection improves accuracy and reduces training time across all models, with the Embedded method achieving the best overall performance.

References

[1] Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

[2] Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2004)

[3] Dash, M., Liu, H.: Feature Selection for Classification. Intell. Data Anal. 1, 131–156 (1997)

[4] Tibshirani, R.: Regression shrinkage and selection via the Lasso: a retrospective. J. R. Stat. Soc. 58, 267–288 (1996)

[5] Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

[6] Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)

[7] Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)

[8] Potla, R.B. (2021). Blueprinting a Manufacturing Data Lakehouse: Harmonizing BOM, Routing, and Serialization Data for Advanced Analytics. International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences, 9(1), 1–12. https://doi.org/10.37082/IJIRMPS.v9.i1.232841

[9] Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17, 491–502 (2005)

[10] Kuhn, M., Johnson, K.: Applied Predictive Modeling. Springer, New York (2013)

[11] Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. Thesis, University of Waikato (1999)

[12] Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22, 4–37 (2000)

[13] Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005)

[14] Uppuluri, V. (2018). The Future of Business Intelligence in Value-Based Care Models. Journal of Artificial Intelligence, Machine Learning & Data Science, 1(1), 3009–3015. https://doi.org/10.51219/JAIMLD/vijitha-uppuluri/623

[15] Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97, 245–271 (1997)

[16] Liu, H., Setiono, R.: Chi squared feature selection for classification. J. Inf. Sci. 19, 669–680 (1999)

Published

March 9, 2022

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.