Consensus of Feature Selection Methods and Reduced Generalization Gap Model to Improve Diagnosis of Heart Disease

Authors

  • S. Gupta Department of Computer Engineering, Thakur College of Engineering and Technology, Mumbai, India
  • R. R. Sedamkar Department of Computer Engineering, Thakur College of Engineering and Technology, Mumbai, India

DOI:

https://doi.org/10.3329/jsr.v13i3.53290

Abstract

Enhancing the diagnostic ability of Machine Learning models for acceptable prediction in the healthcare community is still a concern. There are critical care disease datasets available online on which researchers have experimented with a different number of instances and features for similar disease prediction. Further, different Machine Learning (ML) models have different preprocessing requirements. Framingham heart disease data is multicollinear and has missing values. Thus, the proposed model aims to explore the differential preprocessing needs of ML models followed by feature selection in consensus with domain experts and feature extraction to resolve multicollinearity issues. Missing values have been imputed differently for each feature. The work also identifies optimal train set size by plotting a learning curve that provides a minimum generalization gap. When testing is done on this hyperparameter tuned model, performance is enhanced with respect to the F score weighted by support and stratification since the data is imbalanced. Experimental results demonstrate improvement in performance metrics, i.e., weighted F score, precision, recall, accuracy up to 3 %, and F1 score by 8 % for Logistic Regression Classifier with the proposed model. Further, the time required for hyperparameter tuning is reduced by 50% for tree-based models, particularly Classification and Regression Tree (CART).

Downloads

Download data is not yet available.
Abstract
31
pdf
32

Downloads

Published

2021-09-01

How to Cite

Gupta, S., & Sedamkar, R. R. (2021). Consensus of Feature Selection Methods and Reduced Generalization Gap Model to Improve Diagnosis of Heart Disease. Journal of Scientific Research, 13(3), 901–913. https://doi.org/10.3329/jsr.v13i3.53290

Issue

Section

Section A: Physical and Mathematical Sciences