Implementation of a Whisper Architecture-Based Turkish Automatic Speech Recognition (ASR) System and Evaluation of the Effect of Fine-Tuning with a Low-Rank Adaptation (LoRA) Adapter on Its Performance


Polat H., Turan A. K., Koçak C., Ulaş H. B.

ELECTRONICS, cilt.13, sa.21, 2024 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 13 Sayı: 21
  • Basım Tarihi: 2024
  • Doi Numarası: 10.3390/electronics13214227
  • Dergi Adı: ELECTRONICS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Communication Abstracts, INSPEC, Metadex, Directory of Open Access Journals, Civil Engineering Abstracts
  • Gazi Üniversitesi Adresli: Evet

Özet

This paper focuses on the implementation of the Whisper architecture to create an automatic speech recognition (ASR) system optimized for the Turkish language, which is considered a low-resource language in terms of speech recognition technologies. Whisper is a transformer-based model known for its high performance across numerous languages. However, its performance in Turkish, a language with unique linguistic features and limited labeled data, has yet to be fully explored. To address this, we conducted a series of experiments using five different Turkish speech datasets to assess the model's baseline performance. Initial evaluations revealed a range of word error rates (WERs) between 4.3% and 14.2%, reflecting the challenges posed by Turkish. To improve these results, we applied the low-rank adaptation (LoRA) technique, which is designed to fine-tune large-scale models efficiently by introducing a reduced set of trainable parameters. After fine-tuning, significant performance improvements were observed, with WER reductions of up to 52.38%. This study demonstrates that fine-tuned Whisper models can be successfully adapted for Turkish, resulting in a robust and accurate end-to-end ASR system. This research highlights the applicability of Whisper in low-resource languages and provides insights into the challenges of and strategies for improving speech recognition performance in Turkish.