Visual Question Answering for Intelligent Communication Systems: A Systematic Review


Gullu M., BARIŞÇI N.

IEEE Access, cilt.14, ss.11607-11630, 2026 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Derleme
  • Cilt numarası: 14
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1109/access.2026.3654676
  • Dergi Adı: IEEE Access
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Sayfa Sayıları: ss.11607-11630
  • Anahtar Kelimeler: 5G networks, 6G networks, intelligent communication, visual question answering, VQA
  • Gazi Üniversitesi Adresli: Evet

Özet

Visual Question Answering (VQA) has emerged as a transformative multimodal artificial intelligence paradigm that integrates computer vision and natural language processing to enable intelligent systems to comprehend visual content and respond to natural language queries. WhileVQAapplications have been extensively explored in domains such as healthcare, education, and autonomous systems, its potential integration into intelligent communication systems remains largely underexplored. This systematic review addresses this critical gap by examining VQA’s role within the context of next-generation communication infrastructures, particularly 5G and 6G networks. We conducted a comprehensive analysis of 111 peer-reviewed publications following the PRISMA methodology, focusing on three primary research questions: (1) current applications of VQA in communication systems, (2) technical and operational challenges encountered during integration, and (3) future research opportunities and proposed solutions. Our findings reveal that VQA serves as a cornerstone technology for semantic communication paradigms in 6G networks, enabling task-oriented information transmission in applications ranging from smart city infrastructures and IoT-based environmental monitoring to autonomous transportation systems and remote sensing. However, significant challenges persist, including computational intensity and latency constraints on edge devices, multimodal data fusion complexities under channel impairments, privacy and security vulnerabilities in distributed architectures, and the need for domain-specific evaluation metrics beyond traditional accuracy measures. To address these challenges, we propose a comprehensive research agenda emphasizing edge-cloud hybrid architectures, semantic communication protocols integrated with joint source-channel coding, AI-native cross-layer management frameworks, federated learning with differential privacy mechanisms, and standardization efforts for Quality of Information (QoI) metrics. This review establishes VQA as a critical enabler of meaning-centric communication in future intelligent networks, providing a roadmap for researchers and practitioners working at the intersection of artificial intelligence and telecommunications.