Generating Automatic Surgical Captions Using a Contrastive Language-Image Pre-Training Model for Nephrectomy Surgery Images Karşılaştırmalı Dil-Görüntü Ön Eğitim Modeli ile Nefrektomi Ameliyatı Görüntülerinde Otomatik Cerrahi Alt Yazıların Oluşturulması

Kütük S., ÇAĞLIKANTAR T., Sarıkaya D.

32nd IEEE Conference on Signal Processing and Communications Applications, SIU 2024, Mersin, Türkiye, 15 - 18 Mayıs 2024, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/siu61531.2024.10601051
Basıldığı Şehir: Mersin
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: automatic caption generation on surgical images, contrastive language-image pre-training, image-to-text matching
Gazi Üniversitesi Adresli: Evet

Özet

Surgical reporting plays an important role in providing surgical feedback and medical training, however, it remains a complex task that requires clinical expertise. It also plays an important role in postoperative care and diagnosis of complications after surgery. Surgeons and surgical teams can use automatically generated surgical captions for effective and efficient surgical reporting. Automated surgical report generation has the potential to decrease surgeons' workload and, therefore improve surgical outcomes. We utilized a customized Contrastive Language-Image Pre-Training (CLIP) model in this research, employing VGG-19 as the image encoder and ClinicalBERT as the text encoder to produce automated surgical descriptions for nephrectomy surgery images. The model we named SurgicalClip achieved the following average scores: 0.702 BLEU-1, 0.51 BLEU-4, 3.615 CIDEr, 0.382 METEOR, and 0.657 ROUGE, and performed comparably to the benchmark models although it proposes a lightweight solution.