Advanced Data Mining and Knowledge Discovery Techniques for High Dimensional Big Data Analytics
Keywords:
High-dimensional data, data mining, big data analytics, dimensionality reduction, machine learning, feature selection, knowledge discoverySynopsis
The increasing complexity and volume of big data in the modern era has propelled the need for advanced data mining and knowledge discovery techniques, particularly for high-dimensional datasets. High dimensionality not only amplifies computational challenges but also deteriorates traditional model performance due to the "curse of dimensionality." This paper reviews and contextualizes state-of-the-art data mining approaches as of 2024, including dimensionality reduction, ensemble learning, and deep learning-based methods. We analyze recent advances in applying these methods across diverse fields such as healthcare, cybersecurity, and genomics. The paper also presents a comparative framework of key algorithms, reviews major literature up to 2023, and discusses future research directions in handling heterogeneous and sparse high-dimensional data.
References
[1] Ding, C., & Peng, H. (2005). Minimum redundancy feature selection from microarray gene expression data. Journal of bioinformatics and computational biology, 3(02), 185-205. https://doi.org/10.1142/S0219720005001004
[2] Gundaboina A. Data Loss Prevention in Healthcare: Advanced Strategies for Protecting PHI in Cloud Environments. Journal of Artificial Intelligence, Machine Learning and Data Science 2023 1(2), 3045-3051. DOI: doi.org/10.51219/JAIMLD/anjan-gundaboina/628
[3] Fodor, I. K. (2002). A survey of dimension reduction techniques. Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, 9(1), 1-18.
[4] Gundaboina, A. (2024). HITRUST Certification Best Practices: Streamlining Compliance for Healthcare Cloud Solutions. International Journal of Computer Science and Information Technology Research, 5(1), 76–94. https://ijcsitr.org/index.php/home/article/view/IJCSITR_2024_05_01_008
[5] Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
[6] Jain, A. K., Murty, M. N., & Flynn, P. J. (2000). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264-323.
[7] Uppuluri, V. (2023). Design and Deployment of Predictive Models for Influenza Breakthrough Infections Using Pharmacy Test Data. Journal of Artificial Intelligence, Machine Learning & Data Science, 1(2), 3031–3037. https://doi.org/10.51219/JAIMLD/vijitha-uppuluri/626
[8] Liu, H., & Motoda, H. (2007). Computational methods of feature selection. CRC Press.
[9] Tang, J., Alelyani, S., & Liu, H. (2014). Feature selection for classification: A review. Data classification: Algorithms and applications, 37.
[10] Potla, R.B. (2023). Supplier Collaboration Portals for Component Manufacturers: Procure-to-Pay Automation and Working-Capital Outcomes. International Journal of Artificial Intelligence (ISCSITR-IJAI), 4(1), 16–40. https://doi.org/10.63397/ISCSITR-IJAI_04_01_002
[11] Luo, W., Phung, D., Tran, T., Gupta, S., Rana, S., Karmakar, C., & Venkatesh, S. (2016). Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. Journal of medical Internet research, 18(12), e323.
[12] Vallemoni, R.K. (2023). Merchant Onboarding and Risk Scoring: Data Governance, Master Data, and Golden-Record Strategies. ISCSITR - International Journal of Scientific Research in Information Technology (ISCSITR-IJSRIT), 4(1), 16–41. https://doi.org/10.63397/ISCSITR-IJSRIT_04_01_002
[13] Wang, J., Perez-Rosas, V., & Mihalcea, R. (2020). Learning in High-Dimensional Spaces: A Survey of Dimensionality Reduction Techniques. IEEE Transactions on Knowledge and Data Engineering.
[14] Vallemoni, R.K. (2023). Data Lineage and Metadata in Payment Ecosystems: Auditability and Regulatory Readiness across the Life Cycle. Frontiers in Computer Science and Artificial Intelligence, 2(1), 46–58. https://doi.org/10.32996/fcsai.2023.2.1.5
Published
Series
Categories
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.