Consensus of Feature Selection Methods and Reduced Generalization Gap Model to Improve Diagnosis of Heart Disease
DOI:
https://doi.org/10.3329/jsr.v13i3.53290Abstract
Enhancing the diagnostic ability of Machine Learning models for acceptable prediction in the healthcare community is still a concern. There are critical care disease datasets available online on which researchers have experimented with a different number of instances and features for similar disease prediction. Further, different Machine Learning (ML) models have different preprocessing requirements. Framingham heart disease data is multicollinear and has missing values. Thus, the proposed model aims to explore the differential preprocessing needs of ML models followed by feature selection in consensus with domain experts and feature extraction to resolve multicollinearity issues. Missing values have been imputed differently for each feature. The work also identifies optimal train set size by plotting a learning curve that provides a minimum generalization gap. When testing is done on this hyperparameter tuned model, performance is enhanced with respect to the F score weighted by support and stratification since the data is imbalanced. Experimental results demonstrate improvement in performance metrics, i.e., weighted F score, precision, recall, accuracy up to 3 %, and F1 score by 8 % for Logistic Regression Classifier with the proposed model. Further, the time required for hyperparameter tuning is reduced by 50% for tree-based models, particularly Classification and Regression Tree (CART).
Downloads
31
32
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
© Journal of Scientific Research
Articles published in the "Journal of Scientific Research" are Open Access articles under a Creative Commons Attribution-ShareAlike 4.0 International license (CC BY-SA 4.0). This license permits use, distribution and reproduction in any medium, provided the original work is properly cited and initial publication in this journal. In addition to that, users must provide a link to the license, indicate if changes are made and distribute using the same license as original if the original content has been remixed, transformed or built upon.