A new model based on multi-axis vision transformer for chondromalacia patella diagnosis in magnetic resonance scans

Demirel, Semih; Demirtaş, Okan; Ordu, Sümeyra; Kazcı, Ömer; Akkaya, Habip; YILDIZ, OKTAY

doi:10.1007/s13246-026-01707-5

A new model based on multi-axis vision transformer for chondromalacia patella diagnosis in magnetic resonance scans

Demirel S., Demirtaş O., Ordu S. K., Kazcı Ö., Akkaya H. E., YILDIZ O.

Physical and Engineering Sciences in Medicine, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2026
Doi Numarası: 10.1007/s13246-026-01707-5
Dergi Adı: Physical and Engineering Sciences in Medicine
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Anahtar Kelimeler: Chondromalacia patella, Convolutional neural networks, Deep learning, Medical image classification, Multi-axis vision transformer, Swin transformer, Vision transformer
Gazi Üniversitesi Adresli: Evet

Özet

A degenerative disease of the patellofemoral joint cartilage, chondromalacia patella (CMP) often results in anterior knee discomfort and functional disability. Determining the best course of therapy and stopping the progression of the disease depend on an accurate and timely diagnosis. In this work, we provide a deep learning architecture based on transformers for the classification of chondromalacia patella using magnetic resonance imaging (MRI) data. We assessed transformer-based designs including Multi-Axis Vision Transformer (MaxViT), Vision Transformer (ViT), and Swin Transformer in addition to convolutional neural network (CNN) based models like Google Network (GoogLeNet), Residual Network 18 (ResNet18), and Mobile Network v2 (MobileNetV2). We evaluated the models’ ability to differentiate between cases of chondromalacia patella and normal cases. With an accuracy of 0.9817, precision of 0.9821, recall of 0.9817, and F1-score of 0.9818, Multi-Axis Vision Transformer outperformed all other models on the testing dataset.