Deep Learning based, a New Model for Video Captioning

Ozer, Elif; Karapinar, Ilteber; Busbug, Sena; Turan, Sumeyye; Utku, Anil; AKCAYOL, MUHAMMET

Deep Learning based, a New Model for Video Captioning

Ozer E. G., Karapinar I. N., Busbug S., Turan S., Utku A., AKCAYOL M. A.

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, cilt.11, sa.3, ss.514-519, 2020 (ESCI)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 11 Sayı: 3
Basım Tarihi: 2020
Dergi Adı: INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS
Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus, Index Islamicus, INSPEC
Sayfa Sayıları: ss.514-519
Anahtar Kelimeler: Video captioning, CNN, LSTM
Gazi Üniversitesi Adresli: Evet

Özet

Visually impaired individuals face many difficulties in their daily lives. In this study, a video captioning system has been developed for visually impaired individuals to analyze the events through real-time images and express them in meaningful sentences. It is aimed to better understand the problems experienced by visually impaired individuals in their daily lives. For this reason, the opinions and suggestions of the disabled individuals within the Altmokta Blind Association (Turkish organization of blind people) have been collected to produce more realistic solutions to their problems. In this study, MSVD which consists of 1970 YouTube clips has been used as training dataset. First, all clips have been muted so that the sounds of the clips have not been used in the sentence extraction process. The CNN and LSTM architectures have been used to create sentence and experimental results have been compared using BLEU 4, ROUGE-L and CIDEr and METEOR.