Sparse regression with clustered predictors

Aiden   Kenny; Danielle Solomon; Mohammad S A Patwary; Kumer P Das

doi:10.3329/jsr.v56i1.63945

Authors

Aiden Kenny Franklin Marshall College, P.O. Box 3003, Lancaster, PA 17604-3003, USA
Danielle Solomon St. Johns University, 8000 Utopia Parkway, Queens, NY 11439, USA
Mohammad S A Patwary Butler University, 4600 Sunset Ave, Indianapolis, IN 46208, USA
Kumer P Das University of Louisiana at Lafayette, 104 E University Ave, Lafayette, LA 70504, USA

DOI:

https://doi.org/10.3329/jsr.v56i1.63945

Keywords:

High dimensional data; Logistic regression; Sparse regression; Regularization; Principal Component Analysis; Clustering.

Abstract

Gene expression data can be challenging to analyze due to its high-dimensional nature. Regularization techniques are useful in reducing the number of predictors and highlighting the signiﬁcant genes, in this case, genes that may indicate the presence of cancer. This study aims to see if grouping the genes before applying the regularization techniques is beneﬁcial in reducing the prediction error of classiﬁcation. We investigate the potential effectiveness of using clustering algorithms to generate a grouping structure for high-dimensional data sets. Using various regularization techniques, we seek to determine if the generated groups are truly relevant to the response and if the accuracy and interpretability of the models can be improved. We apply the clustered group structure to two real-world data sets. We also employ simulation studies to assess the performance of different regularization methods for both clustering and no-clustering methods.

Journal of Statistical Research 2022, Vol. 56, No. 1, pp. 37-53

Abstract
244

PDF
125

Sparse regression with clustered predictors

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Information

Current Issue