An overview of transformers for video anomaly detection


Dilek E., DENER M.

Neural Computing and Applications, vol.37, no.22, pp.17825-17857, 2025 (Scopus) identifier

  • Publication Type: Article / Review
  • Volume: 37 Issue: 22
  • Publication Date: 2025
  • Doi Number: 10.1007/s00521-025-11218-1
  • Journal Name: Neural Computing and Applications
  • Journal Indexes: Scopus, Compendex, Index Islamicus, INSPEC, zbMATH
  • Page Numbers: pp.17825-17857
  • Keywords: Anomaly detection, Computer vision, Transformer, Video anomaly detection, Vision transformer
  • Gazi University Affiliated: Yes

Abstract

Transformer is a kind of deep neural network that relies on the technique of self-attention and used initially in the field of natural language processing. Scientists use transformer for computer vision (CV) applications because of its good data representation capabilities. Transformer-based models yield similar performance or surpass other network architectures, including convolutional and recurrent neural networks, in a variety of visual benchmarks. In this work, we investigate the methods for video anomaly detection (VAD) using vision transformer models in the recent literature. The main topics we explore comprise vision transformers used in CV applications with a special focus on VAD methods leveraging transformer architecture. We also briefly present anomaly detection methods based on transformers. Additionally, we address the advantages, challenges and current limitations of the transformer architecture as well as potential solutions to address the technical challenges. In the concluding section of this study, we offer avenues for further investigation concerning the use of vision transformers in VAD tasks.