Star Classification Using Machine Learning: A Comparative Analysis of Random Forest and LightGBM on SDSS Data

Authors

  • Yasir Arafat Department of Statistics, Hajee Mohammad Danesh Science and Technology University, Dinajpur-5200, Bangladesh
  • Rasna Begum Department of Statistics, Hajee Mohammad Danesh Science and Technology University, Dinajpur-5200, Bangladesh
  • Md Saifur Rahman Department of Statistics, Hajee Mohammad Danesh Science and Technology University, Dinajpur-5200, Bangladesh
  • Md Kaderi Kibria Department of Statistics, Hajee Mohammad Danesh Science and Technology University, Dinajpur-5200, Bangladesh

DOI:

https://doi.org/10.3329/ijss.v25i2.85778

Keywords:

Star Classification, Harvard Spectral Types, Sloan Digital Sky Survey (SDSS), Random Forest, LightGBM, Accuracy, Computational efficiency, Scalability.

Abstract

Stars are fundamental components of the universe, and their classification provides critical insights into stellar properties and evolution. The traditional Harvard spectral classification system, though accurate, is computationally demanding and unsuitable for modern large-scale astronomical data. With the advent of extensive surveys such as the Sloan Digital Sky Survey (SDSS), machine learning provides a scalable and efficient alternative. Unlike previous studies that typically focus on single-model applications, this study conducts a comparative analysis of Random Forest (RF) and LightGBM (LGBM) algorithms for automated star classification using SDSS photometric data. Both models achieved strong classification results, with a micro-average AUC of 0.96. RF showed better performance for specific spectral classes while LGBM achieved comparable accuracy with significantly faster training times. However, LGBM required more memory. Scalability analysis revealed LGBM's superior handling of larger datasets. These findings suggest that model selection should consider application-specific priorities: Random Forest for real-time inference and LGBM for large-scale, high-throughput classification. Future work will explore advanced features of engineering, hyperparameter optimization, and deep learning approaches to further improve classification performance. This study underscores the potential of machine learning in astrophysics and provides guidance for model selection in automated star classification tasks.

International Journal of Statistical Sciences, Vol. 25(2), November, 2025, pp 159-172

Abstract
9
PDF
5

Downloads

Published

2025-12-17

How to Cite

Arafat, Y., Begum, R., Rahman, M. S., & Kibria, M. K. (2025). Star Classification Using Machine Learning: A Comparative Analysis of Random Forest and LightGBM on SDSS Data. International Journal of Statistical Sciences , 25(2), 159–172. https://doi.org/10.3329/ijss.v25i2.85778

Issue

Section

Original Articles