Robust Variable Selection in High-Dimensional Data: Mitigating Cellwise Contamination Through Comparative Analysis

Robust Variable Selection in High-Dimensional Data

Authors

  • Nadia Mehjabeen Oyshi Department of Statistics, University of Dhaka, Dhaka-1000, Bangladesh
  • Md Tuhin Rana Department of Statistics, University of Dhaka, Dhaka-1000, Bangladesh
  • Md Jamil Hasan Karami Department of Statistics, University of Dhaka, Dhaka-1000, Bangladesh

DOI:

https://doi.org/10.3329/dujs.v73i2.82773

Keywords:

Cellwise contamination, Robust variable selection, Gaussian Rank correlation, High-dimensional regression, independent contamination model, Sparse robust regression

Abstract

The proliferation of high-dimensional data has heightened challenges posed by cellwise outliers, where contamination in individual cells distorts analyses more pervasively than traditional rowwise outliers. This study conducts a comprehensive comparison of robust variable selection methods under cellwise contamination, evaluating four rank-based techniques (ALGR, ALRP, LGR, LRP) against traditional approaches (Lasso, Adaptive Lasso, sLTS). Simulations under varying correlation structures, contamination rates (2%, 5%, 10%), and outlier magnitudes (γ = 2, 6, 10) demonstrate that Gaussian Rank correlation-based methods (ALGR, LGR) achieve superior F1 scores, balancing high true positives and low false positives. Real-data applications on life expectancy and crime datasets corroborate these findings, with ALGR and LGR maintaining robustness in low- and high-dimensional settings. Results emphasize the critical need for methods resilient to cellwise contamination in fields reliant on accurate high-dimensional data analysis, such as healthcare and genomics.

Dhaka Univ. J. Sci. 73(2): 143-150, 2025 (July)

Downloads

Downloads

Published

2025-07-12

How to Cite

Oyshi, N. M., Rana , M. T., & Karami, M. J. H. (2025). Robust Variable Selection in High-Dimensional Data: Mitigating Cellwise Contamination Through Comparative Analysis: Robust Variable Selection in High-Dimensional Data. Dhaka University Journal of Science, 73(2), 143–150. https://doi.org/10.3329/dujs.v73i2.82773

Issue

Section

Articles