Automatic Surgical Caption Generation in Nephrectomy Surgery Videos Nefrektomi Ameliyat Videolarinda Otomatik Cerrahi Altyazi Üretimi


Kütük S., Bombieri M., Dall'Alba D., Fiorini P., Sarikaya D.

31st IEEE Conference on Signal Processing and Communications Applications, SIU 2023, İstanbul, Türkiye, 5 - 08 Temmuz 2023 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu59756.2023.10223890
  • Basıldığı Şehir: İstanbul
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: attention mechanisms, automatic caption generation in surgical images, deep learning
  • Gazi Üniversitesi Adresli: Evet

Özet

Captioning surgical images is used in computer- aided diagnosis, intervention, and surgical training, however, it is a challenging task that requires expertise. Using automatic surgical captioning, the time-consuming and error-prone process of reporting can be carried out automatically and quickly. In addition to assisting doctors in making more precise and timely diagnoses, this procedure can shorten intra and post-operative reporting procedures, allowing doctors to provide patients with better care. Recently, several deep learning approaches have been proposed for the recognition of activities performed in surgical videos, however, there are not yet many studies on surgical image captioning with natural language. In this study, we generate surgical captions for nephrectomy surgery images automatically, using the Inception-v3 encoder to extract the visual features and the Gated Recurrent Unit (GRU) decoder with an attention mechanism. In our model, we used the Bahdanau attention mechanism, which learns attention weights directly from data using a neural network and takes into account the previous attention state and the current decoder state for the calculation of these weights. We tested our model on the Robotic Scene Segmentation Challenge dataset using the Bleu-N, Rouge-N, and Rouge-L metrics and compared them to a similar model using the Luong attention mechanism. Our model using Bahdanau attention outperformed an identical model using the Luong attention mechanism with an average of 0.654 Bleu-N, 0.737 Rouge-N, and 0.802 Rouge-L scores.