A Comparative Study of Advanced Transformer Learning Frameworks for Water Potability Analysis Using Physicochemical Parameters

Algül, Enes; OYUCU, SAADİN; Polat, Onur; Çelik, Hüseyin; Ekşi, Süleyman; Kurker, Faruk; AKSÖZ, AHMET

doi:10.3390/app15137262

A Comparative Study of Advanced Transformer Learning Frameworks for Water Potability Analysis Using Physicochemical Parameters

Algül E., OYUCU S., Polat O., Çelik H., Ekşi S., Kurker F., ...Daha Fazla

Applied Sciences (Switzerland), cilt.15, sa.13, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 15 Sayı: 13
Basım Tarihi: 2025
Doi Numarası: 10.3390/app15137262
Dergi Adı: Applied Sciences (Switzerland)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Agricultural & Environmental Science Database, Applied Science & Technology Source, Communication Abstracts, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
Anahtar Kelimeler: attention mechanism, deep learning, environmental monitoring, physicochemical features, tabular data classification, transformer models, water potability
Gazi Üniversitesi Adresli: Evet

Özet

Keeping drinking water safe is a critical aspect of protecting public health. Traditional laboratory-based methods for evaluating water potability are often time-consuming, costly, and labour-intensive. This paper presents a comparative analysis of four transformer-based deep learning models in the development of automatic classification systems for water potability based on physicochemical attributes. The models examined include the enhanced tabular transformer (ETT), feature tokenizer transformer (FTTransformer), self-attention and inter-sample network (SAINT), and tabular autoencoder pretraining enhancement (TAPE). The study utilized an open-access water quality dataset that includes nine key attributes such as pH, hardness, total dissolved solids (TDS), chloramines, sulphate, conductivity, organic carbon, trihalomethanes, and turbidity. The models were evaluated under a unified protocol involving 70–15–15 data partitioning, five-fold cross-validation, fixed random seed, and consistent hyperparameter settings. Among the evaluated models, the enhanced tabular transformer outperforms other models with an accuracy of 95.04% and an F1 score of 0.94. ETT is an advanced model because it can efficiently model high-order feature interactions through multi-head attention and deep hierarchical encoding. Feature importance analysis consistently highlighted chloramines, conductivity, and trihalomethanes as key predictive features across all models. SAINT demonstrated robust generalization through its dual-attention mechanism, while TAPE provided competitive results with reduced computational overhead due to unsupervised pretraining. Conversely, FTTransformer showed limitations, likely due to sensitivity to class imbalance and hyperparameter tuning. The results underscore the potential of transformer-based models, especially ETT, in enabling efficient, accurate, and scalable water quality monitoring. These findings support their integration into real-time environmental health systems and suggest approaches for future research in explainability, domain adaptation, and multimodal fusion.