An overview of transformers for video anomaly detection

Dilek, Esma; DENER, MURAT

doi:10.1007/s00521-025-11218-1

An overview of transformers for video anomaly detection

Dilek E., DENER M.

Neural Computing and Applications, cilt.37, sa.22, ss.17825-17857, 2025 (Scopus)

Yayın Türü: Makale / Derleme
Cilt numarası: 37 Sayı: 22
Basım Tarihi: 2025
Doi Numarası: 10.1007/s00521-025-11218-1
Dergi Adı: Neural Computing and Applications
Derginin Tarandığı İndeksler: Scopus, Compendex, Index Islamicus, INSPEC, zbMATH
Sayfa Sayıları: ss.17825-17857
Anahtar Kelimeler: Anomaly detection, Computer vision, Transformer, Video anomaly detection, Vision transformer
Gazi Üniversitesi Adresli: Evet

Özet

Transformer is a kind of deep neural network that relies on the technique of self-attention and used initially in the field of natural language processing. Scientists use transformer for computer vision (CV) applications because of its good data representation capabilities. Transformer-based models yield similar performance or surpass other network architectures, including convolutional and recurrent neural networks, in a variety of visual benchmarks. In this work, we investigate the methods for video anomaly detection (VAD) using vision transformer models in the recent literature. The main topics we explore comprise vision transformers used in CV applications with a special focus on VAD methods leveraging transformer architecture. We also briefly present anomaly detection methods based on transformers. Additionally, we address the advantages, challenges and current limitations of the transformer architecture as well as potential solutions to address the technical challenges. In the concluding section of this study, we offer avenues for further investigation concerning the use of vision transformers in VAD tasks.