Generating Automatic Surgical Captions Using a Contrastive Language-Image Pre-Training Model for Nephrectomy Surgery Images Karşılaştırmalı Dil-Görüntü Ön Eğitim Modeli ile Nefrektomi Ameliyatı Görüntülerinde Otomatik Cerrahi Alt Yazıların Oluşturulması


Kütük S., ÇAĞLIKANTAR T., Sarıkaya D.

32nd IEEE Conference on Signal Processing and Communications Applications, SIU 2024, Mersin, Türkiye, 15 - 18 Mayıs 2024 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu61531.2024.10601051
  • Basıldığı Şehir: Mersin
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: automatic caption generation on surgical images, contrastive language-image pre-training, image-to-text matching
  • Gazi Üniversitesi Adresli: Evet

Özet

Surgical reporting plays an important role in providing surgical feedback and medical training, however, it remains a complex task that requires clinical expertise. It also plays an important role in postoperative care and diagnosis of complications after surgery. Surgeons and surgical teams can use automatically generated surgical captions for effective and efficient surgical reporting. Automated surgical report generation has the potential to decrease surgeons' workload and, therefore improve surgical outcomes. We utilized a customized Contrastive Language-Image Pre-Training (CLIP) model in this research, employing VGG-19 as the image encoder and ClinicalBERT as the text encoder to produce automated surgical descriptions for nephrectomy surgery images. The model we named SurgicalClip achieved the following average scores: 0.702 BLEU-1, 0.51 BLEU-4, 3.615 CIDEr, 0.382 METEOR, and 0.657 ROUGE, and performed comparably to the benchmark models although it proposes a lightweight solution.