32nd IEEE Conference on Signal Processing and Communications Applications, SIU 2024, Mersin, Türkiye, 15 - 18 Mayıs 2024
Surgical reporting plays an important role in providing surgical feedback and medical training, however, it remains a complex task that requires clinical expertise. It also plays an important role in postoperative care and diagnosis of complications after surgery. Surgeons and surgical teams can use automatically generated surgical captions for effective and efficient surgical reporting. Automated surgical report generation has the potential to decrease surgeons' workload and, therefore improve surgical outcomes. We utilized a customized Contrastive Language-Image Pre-Training (CLIP) model in this research, employing VGG-19 as the image encoder and ClinicalBERT as the text encoder to produce automated surgical descriptions for nephrectomy surgery images. The model we named SurgicalClip achieved the following average scores: 0.702 BLEU-1, 0.51 BLEU-4, 3.615 CIDEr, 0.382 METEOR, and 0.657 ROUGE, and performed comparably to the benchmark models although it proposes a lightweight solution.