Optimizing LightGBM and XGBoost Algorithms for Estimating Compressive Strength in High-Performance Concrete


DEMİRTÜRK D., MİNTEMUR Ö., ARSLAN A.

Arabian Journal for Science and Engineering, 2025 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1007/s13369-025-10217-7
  • Dergi Adı: Arabian Journal for Science and Engineering
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Communication Abstracts, Metadex, Pollution Abstracts, zbMATH, Civil Engineering Abstracts
  • Anahtar Kelimeler: Compressive strength, High-performance concrete, LightGBM, Machine learning, SHapley additive explanations, XGBoost
  • Gazi Üniversitesi Adresli: Evet

Özet

This study employed a regression approach to predict the compressive strength of high-performance concrete. We proposed the use of two machine learning algorithms: LightGBM and XGBoost. The models were enhanced by integrating them into the Optuna optimization framework and employing the tree-structured Parzen estimator as the optimization algorithm. The models were trained and validated on an original dataset. To reduce possible overfitting, the dataset was then augmented with SMOGN. K-fold cross-validation was also used to prevent overfitting. The performance of the models was evaluated using RMSE, MAE, and MAPE. A comprehensive analysis of the best-performing LightGBM and XGBoost models was performed based on the individual loss metrics using the SHAP technique to determine the influence of the individual mixture components on compressive strength. On the original dataset, XGBoost achieved an RMSE of 11.44, an MAE of 9.32, and a MAPE of 6.89, while LightGBM achieved an RMSE of 11.20, an MAE of 9.04 and a MAPE of 6.98. On the augmented dataset, XGBoost outperformed LightGBM on all metrics, with an RMSE of 5.67, an MAE of 2.83, and a MAPE of 2.06. LightGBM achieved an RMSE of 5.82, an MAE of 3.35, and a MAPE of 2.48. Both models identified micro steel fiber as the most important component in the original dataset, but disagreed on the second most important component, alternating between water and superplasticizer. However, with the augmented dataset, the SHAP analysis showed that both models consistently ranked micro steel fiber and water as the two most influential components.