Sparse regression with clustered predictors
DOI:
https://doi.org/10.3329/jsr.v56i1.63945Keywords:
High dimensional data; Logistic regression; Sparse regression; Regularization; Principal Component Analysis; Clustering.Abstract
Gene expression data can be challenging to analyze due to its high-dimensional nature. Regularization techniques are useful in reducing the number of predictors and highlighting the significant genes, in this case, genes that may indicate the presence of cancer. This study aims to see if grouping the genes before applying the regularization techniques is beneficial in reducing the prediction error of classification. We investigate the potential effectiveness of using clustering algorithms to generate a grouping structure for high-dimensional data sets. Using various regularization techniques, we seek to determine if the generated groups are truly relevant to the response and if the accuracy and interpretability of the models can be improved. We apply the clustered group structure to two real-world data sets. We also employ simulation studies to assess the performance of different regularization methods for both clustering and no-clustering methods.
Journal of Statistical Research 2022, Vol. 56, No. 1, pp. 37-53
109
38
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Journal of Statistical Research
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.