Prompt-guided synthetic data generation to mitigate domain shift in periapical lesion detection


Özüdoğru S., Coşkun F. Ö., Uysal F., Küçüktaş Ü. T., HARDALAÇ F., Kaya D. I.

PeerJ Computer Science, cilt.12, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 12
  • Basım Tarihi: 2026
  • Doi Numarası: 10.7717/peerj-cs.3577
  • Dergi Adı: PeerJ Computer Science
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, Directory of Open Access Journals
  • Anahtar Kelimeler: Data drift, Deep learning, Domain shift, Periapical lesion detection, Synthetic data generation
  • Gazi Üniversitesi Adresli: Evet

Özet

Objectives: Domain shift—the mismatch between training and deployment distributions—degrades deep learning at inference. Vision-language foundation models (VLMs) like ChatGPT-4o generalize out-of-distribution and enable controllable image-to-image synthesis, but direct clinical use is costly and, when closed-source, raises privacy concerns. We instead use a VLM to guide synthetic data generation, strengthening smaller deployable detectors without task-specific retraining—a controllable substitute for generative adversarial network (GAN)/ diffusion augmentation, which typically requires large domain-specific datasets and unstable training. We use panoramic radiographs to compare a prompt-guided approach with an expert-guided classical image-processing baseline under morphology/size shift. Methods: A dataset of 196 panoramic radiographs (145 annotated, 51 healthy) was used. We implemented two transparent augmentation pipelines: (i) Expert-Guided Synthesis (EGS), where clinicians delineated lesion masks that were procedurally rendered (noise-based texture, intensity modulation, edge blending); and (ii) Prompt-Guided Lesion Synthesis (PGLS), where clinicians specified lesion attributes via text prompts (size, shape, margin sharpness, contrast, location) and ChatGPT-4o produced parameterized image-to-image edits. A You Only Look Once (YOLO)10 detector was trained under three regimes (Real-only, Real+EGS, Real+PGLS) and evaluated with five-fold cross-validation and size-stratified reporting (small/ medium/large). Results: Baseline (Real-only): mean Average Precision (mAP)@0.5 0.47, mAP@ [0.5:0.95] 0.22, Recall 0.47. Size-stratified baselines: small—0.30/0.085/0.27; medium —0.58/0.26/0.50; large—0.46/0.20/0.49. Both synthetic strategies improved robustness; PGLS consistently exceeded EGS. Overall, EGS 0.50/0.25/0.52 (+6.4%/ +13.6%/ +10.6%), PGLS 0.51/0.26/0.53 (+8.5%/ +18.2%/ +12.8%). Largest gains were in small and large: small 0.36/0.120/0.33 (EGS) vs. 0.39/0.130/0.35 (PGLS) = +20.0%/ +41.2%/ +22.2% and +30.0%/ +52.9%/ +29.6%; large 0.51/0.235/0.56 vs. 0.52/0.240/ 0.57 = +10.9%/ +17.5%/ +14.3% and +13.0%/ +20.0%/ +16.3%. Conclusions: Both prompt-guided and expert-guided synthesis improved resilience to morphology shift. PGLS yielded greater gains, reflecting flexible natural-language control while avoiding GAN/diffusion data and stability burdens. Clinically, higher small-lesion recall lowers missed early apical periodontitis, and higher mAP@ [0.5:0.95] tightens localization, curbing false positives and unnecessary follow-ups. Because PGLS uses auditable prompts/ edits, it extends to other shifts (e.g. device or artifact) and strengthens smaller, deployable detectors for more consistent accuracy across sites.