PeerJ Computer Science, cilt.12, 2026 (SCI-Expanded, Scopus)
Objectives: Domain shift—the mismatch between training and deployment distributions—degrades deep learning at inference. Vision-language foundation models (VLMs) like ChatGPT-4o generalize out-of-distribution and enable controllable image-to-image synthesis, but direct clinical use is costly and, when closed-source, raises privacy concerns. We instead use a VLM to guide synthetic data generation, strengthening smaller deployable detectors without task-specific retraining—a controllable substitute for generative adversarial network (GAN)/ diffusion augmentation, which typically requires large domain-specific datasets and unstable training. We use panoramic radiographs to compare a prompt-guided approach with an expert-guided classical image-processing baseline under morphology/size shift. Methods: A dataset of 196 panoramic radiographs (145 annotated, 51 healthy) was used. We implemented two transparent augmentation pipelines: (i) Expert-Guided Synthesis (EGS), where clinicians delineated lesion masks that were procedurally rendered (noise-based texture, intensity modulation, edge blending); and (ii) Prompt-Guided Lesion Synthesis (PGLS), where clinicians specified lesion attributes via text prompts (size, shape, margin sharpness, contrast, location) and ChatGPT-4o produced parameterized image-to-image edits. A You Only Look Once (YOLO)10 detector was trained under three regimes (Real-only, Real+EGS, Real+PGLS) and evaluated with five-fold cross-validation and size-stratified reporting (small/ medium/large). Results: Baseline (Real-only): mean Average Precision (mAP)@0.5 0.47, mAP@ [0.5:0.95] 0.22, Recall 0.47. Size-stratified baselines: small—0.30/0.085/0.27; medium —0.58/0.26/0.50; large—0.46/0.20/0.49. Both synthetic strategies improved robustness; PGLS consistently exceeded EGS. Overall, EGS 0.50/0.25/0.52 (+6.4%/ +13.6%/ +10.6%), PGLS 0.51/0.26/0.53 (+8.5%/ +18.2%/ +12.8%). Largest gains were in small and large: small 0.36/0.120/0.33 (EGS) vs. 0.39/0.130/0.35 (PGLS) = +20.0%/ +41.2%/ +22.2% and +30.0%/ +52.9%/ +29.6%; large 0.51/0.235/0.56 vs. 0.52/0.240/ 0.57 = +10.9%/ +17.5%/ +14.3% and +13.0%/ +20.0%/ +16.3%. Conclusions: Both prompt-guided and expert-guided synthesis improved resilience to morphology shift. PGLS yielded greater gains, reflecting flexible natural-language control while avoiding GAN/diffusion data and stability burdens. Clinically, higher small-lesion recall lowers missed early apical periodontitis, and higher mAP@ [0.5:0.95] tightens localization, curbing false positives and unnecessary follow-ups. Because PGLS uses auditable prompts/ edits, it extends to other shifts (e.g. device or artifact) and strengthens smaller, deployable detectors for more consistent accuracy across sites.