Clustering gene expression time series data embedded in a nonparametric setup

Authors

  • Mukti Khetan Department of Mathematics, Indian Institute of Technology Bombay, Mumbai 400 076, India
  • Savita Pareek Department of Mathematics, Indian Institute of Technology Bombay, Mumbai 400 076, India
  • Siuli Mukhopadhyay Department of Mathematics, Indian Institute of Technology Bombay, Mumbai 400 076, India
  • Kalyan Das Department of Mathematics, Indian Institute of Technology Bombay, Mumbai 400 076, India

Keywords:

Dirichlet Process; Monte Carlo EM algorithm; Mixed effect model; Autoregressive process

Abstract

A clustering methodology for time series data is proposed. The idea has been cropped up when a subset of gene expression dataset is used to build up the system model by compressing the information through clustering and then by tracing out inherent patterns in the data. A linear mixed model is considered that accommodates time dependent components. The temporal effects are modelled through an autoregressive process that arises in the dispersion of the random component. The joint distribution of coefficients in the time dependent quadratic function and the random effects are embedded within a non-parametric prior (Dirichlet process prior). Such a non-parametric prior induces clustering in the data. Monte Carlo EM (MCEM) based technique has been considered for estimating the parameters. The best cluster is selected through some heterogeneity measures. A rigorous simulation study has been carried out prior to analysis of a gene expression time series data.

Journal of Statistical Research 2021, Vol. 55, No. 1, pp. 207-224

Abstract
30
PDF
15

Downloads

Published

2021-12-09

How to Cite

Khetan, M. ., Pareek, S. ., Mukhopadhyay, S., & Das, K. (2021). Clustering gene expression time series data embedded in a nonparametric setup. Journal of Statistical Research, 55(1), 207–224. Retrieved from https://banglajol.info/index.php/JStR/article/view/56589

Issue

Section

Articles