A multi-stage computational and bioinformatics framework for the identification and validation of hub toxicogenomic biomarkers : A multi-stage computational and bioinformatics framework

Mohammad Nazmol Hasan; Mohammad Shah Alam; Md Mamunur Rahman

doi:10.3329/aba.v29i2.84008

Authors

Mohammad Nazmol Hasan Department of Agricultural and Applied Statistics, Gazipur Agricultural University, Gazipur 1706, Bangladesh
Mohammad Shah Alam Department of Anatomy and Histology, Gazipur Agricultural University, Gazipur, 1706, Bangladesh
Md Mamunur Rahman Department of Entomology, Gazipur Agricultural University, Gazipur, 1706, Bangladesh

DOI:

https://doi.org/10.3329/aba.v29i2.84008

Keywords:

chemical toxicity, toxicogenomic biomarker, statistical methods, machine learning approaches, bioinformatics approaches, protein-protein interaction network

Abstract

Chemical toxicity is challenging to mitigate, necessitating a revisit to seed compound screening. Safety is crucial in approving drugs, pesticides, and cosmetics, necessitating the identification of safety biomarkers, such as toxicogenomic biomarkers (ToxBG), to predict potential toxicity. In this regard, we proposed a sequence of computational and bioinformatics approaches to identify key/hub ToxBG (HToxBG) for predicting chemical toxicity. In this sequence, we initially identified ToxBGs using statistical approaches, such as t-test, Wilcoxon signed-rank test (WSR-test), and linear model for microarray data analysis (LIMMA), based on the chemically treated and control samples of gene expression data collected from the online database “T oxy gates.” In the treatment group, rat samples were treated with chemicals (acetaminophen, bromobenzene, coumarin, methapyrilene, and nitrofurazone) with three dose levels, and gene expression data were collected at multiple time points. These statistical approaches, including the t-test, WSR test, and LIMMA, identified 3,856, 3,232, and 3,377 ToxBGs, respectively. Of these,2,877 were common and considered second-stage ToxBGs. This study validated the second-stage ToxBGs using four machine learning (ML) approaches. Among these ML approaches, the support vector machine (SVM) achieved higher accuracy in classifying treated and control samples, yielding sensitivity of 0.98, specificity of 0.97, accuracy of 0.98, and AUC (0.99) compared to other methods. The second-stage ToxBGs were also co-clustered with their associated chemicals. The protein-protein interaction(PPI) network analysis predicted that the second-stage ToxBGs were enriched in the biological pathways that perform important functions. Additionally, these ToxBGs werealso enriched in different diseases like liver cirrhosis, HIV coinfection, gastric cancer, generalized hypotonia, neoplasm of the liver, etc. Out of 2877 common ToxBGs, 160 key/hub ToxBGs (HToxBGs) have been identified, 70 genes associated with disease states, and 90 involved in critical biological pathways, enabling the study of chemical toxicity. Therefore, the proposed sequence of computational and bioinformatics approaches can be used to identify HToxBGs and predict chemical toxicity.

Ann. Bangladesh Agric. 29(2): 93-109

Downloads

Download data is not yet available.

Abstract
40

PDF
26

A multi-stage computational and bioinformatics framework for the identification and validation of hub toxicogenomic biomarkers