Identification of Predisposing Risk Factors for Chronic Kidney Disease and Optimizing Disease Prediction Using a Stacking Machine Learning Algorithm
DOI:
https://doi.org/10.3329/ijss.v25i2.85732Keywords:
Chronic Kidney Disease (CKD), predisposing risk factors, stacking machine learning algorithm, disease prediction, feature selection.Abstract
Chronic kidney disease (CKD) remains a major global health concern, often progressing silently until advanced stages. Early detection is therefore critical to improving outcomes. This study focuses on identifying the most important predisposing risk factors for CKD and optimizing prediction performance using a stacking machine learning ensemble. Two datasets were analyzed: the UCI CKD dataset and a synthetically simulated CKD dataset designed to mirror real-world variability. A leakage-safe preprocessing pipeline was implemented, including median and mode imputation for missing values, Z-score capping for outliers, min–max normalization, and class balancing through the Synthetic Minority Over-sampling Technique (SMOTE). Feature selection was performed using six complementary approaches: Logistic Regression (LR), Recursive Feature Elimination (RFE), Random Forest (RF), Mutual Information (MI), Chi-Square (χ²), and Principal Component Analysis (PCA) with a majority-vote strategy used to identify features consistently recognized as predictive. For the UCI CKD dataset, the common risk factors were serum creatinine, hemoglobin, packed cell volume, red blood cell count, specific gravity, albumin, sugar, hypertension, diabetes mellitus, and appetite. For the simulated CKD dataset, key predictors included blood glucose random, blood urea, serum creatinine, potassium, hemoglobin, packed cell volume, specific gravity, albumin, red blood cells, pus cell clumps, appetite, pedal edema, and anemia. Using these selected features, the stacking ensemble achieved 100.0% accuracy on the UCI CKD dataset and 96.7% accuracy on the simulated dataset, both with negligible misclassification rates. Bootstrap confidence intervals confirmed the robustness of these results. The findings highlight that combining systematic feature selection with stacking significantly improves predictive accuracy while maintaining interpretability. This integrated framework offers a reliable tool for early CKD detection and can support clinical decision-making in real-world healthcare settings. Future work will focus on expanding validation across multi-site data and developing a clinical decision support interface.
IJSS, Vol. 25(2), November, 2025, pp 1-32
30
6
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Department of Statistics, University of Rajshahi, Rajshahi

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.