Sparse regression with clustered predictors

Authors

  • Aiden Kenny Franklin Marshall College, P.O. Box 3003, Lancaster, PA 17604-3003, USA
  • Danielle Solomon St. Johns University, 8000 Utopia Parkway, Queens, NY 11439, USA
  • Mohammad S A Patwary Butler University, 4600 Sunset Ave, Indianapolis, IN 46208, USA
  • Kumer P Das University of Louisiana at Lafayette, 104 E University Ave, Lafayette, LA 70504, USA

DOI:

https://doi.org/10.3329/jsr.v56i1.63945

Keywords:

High dimensional data; Logistic regression; Sparse regression; Regularization; Principal Component Analysis; Clustering.

Abstract

Gene expression data can be challenging to analyze due to its high-dimensional nature. Regularization techniques are useful in reducing the number of predictors and highlighting the significant genes, in this case, genes that may indicate the presence of cancer. This study aims to see if grouping the genes before applying the regularization techniques is beneficial in reducing the prediction error of classification. We investigate the potential effectiveness of using clustering algorithms to generate a grouping structure for high-dimensional data sets. Using various regularization techniques, we seek to determine if the generated groups are truly relevant to the response and if the accuracy and interpretability of the models can be improved. We apply the clustered group structure to two real-world data sets. We also employ simulation studies to assess the performance of different regularization methods for both clustering and no-clustering methods.

Journal of Statistical Research 2022, Vol. 56, No. 1, pp. 37-53

Abstract
66
PDF
35

Downloads

Published

2023-02-01

How to Cite

Kenny, A. . ., Solomon, D., Patwary, M. S. A., & Das, K. P. (2023). Sparse regression with clustered predictors. Journal of Statistical Research, 56(1), 37–53. https://doi.org/10.3329/jsr.v56i1.63945

Issue

Section

Articles