Dual-view attention and hierarchical transformer framework for adult age estimation from sternum MDCT images


Türk F.

EXPERT SYSTEMS WITH APPLICATIONS, cilt.1, sa.307, ss.1-14, 2026 (SCI-Expanded, Scopus)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 1 Sayı: 307
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1016/j.eswa.2026.131144
  • Dergi Adı: EXPERT SYSTEMS WITH APPLICATIONS
  • Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED), Compendex, INSPEC, Public Affairs Index
  • Sayfa Sayıları: ss.1-14
  • Gazi Üniversitesi Adresli: Evet

Özet

Accurate estimation of adult age from skeletal remains plays a critical role in forensic identification and medical diagnostics. However, after skeletal maturity, conventional anthropometric and radiological indicators lose discriminative power. To address this limitation, we introduce two complementary deep learning architectures—Dual-Attention Sternal Estimator (D-ASE) and Swin Transformer – extended Large (medical variant), denoted as Swin-XL(med)+, designed specifically for age estimation from Multi-Detector Computed Tomography (MDCT) images of the sternum. D-ASE integrates fuzzy-enhanced image preprocessing, HU-aware dual-channel attention, and bi-branch feature fusion, enabling simultaneous learning of morphological and textural cues. Swin-XL(med) + extends the Swin Transformer backbone with Focal-Deformable attention, Cartilage-Aware gating, and a BiFPN-lite fusion head for efficient multi-scale representation. Experiments on a balanced dataset of 600 coronal and sagittal sternum images (300 patients, four age groups: 20–35, 36–50, 51–65, 65 + ) demonstrate that Swin-XL(med) + achieves 98.14 % test accuracy, outperforming baseline CNN and standard Swin Transformer models, while D-ASE achieves 96.72 % with superior generalization on small samples. Grad-CAM and Layer-CAM analyses confirm that both models emphasize clinically relevant anatomical structures—particularly the manubriosternal junction, costosternal cartilage, and xiphoid process. These results reveal that hierarchical Transformer-based and attention-fusion models can produce explainable, high-accuracy predictions for adult forensic age estimation, paving the way for robust, population-agnostic radiological assessment systems.