A Language Model Optimization Method for Turkish Automatic Speech Recognition System


Oyucu S., Polat H.

JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, cilt.26, sa.3, ss.1167-1178, 2023 (ESCI) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 26 Sayı: 3
  • Basım Tarihi: 2023
  • Doi Numarası: 10.2339/politeknik.1085512
  • Dergi Adı: JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI
  • Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.1167-1178
  • Anahtar Kelimeler: Turkish Automatic speech recognition, Turkish language model, Turkish language model score optimization, Turkish corpus, NEURAL-NETWORKS
  • Gazi Üniversitesi Adresli: Evet

Özet

The current Automatic Speech Recognition (ASR) modeling strategy still suffers from huge performance degradation when faced with languages with limited resources such as Turkish. Especially when the Language Model (LM) does not support the Acoustic Model (AM) sufficiently, the Word Error Rate (WER) increases. Therefore, a robust LM makes a strong contribution to improving ASR performance by generating word relations from the existing corpus. However, developing a robust language model is a challenging task due to the agglutinative nature of Turkish. Therefore, within the scope of the study, a sentence-level LM optimization method is proposed to improve the WER performance of Turkish ASR. In the proposed method, instead of a fixed word sequence obtained from the Markov assumptions, the probability of the word sequence forming a sentence was calculated. A method with n-gram and skip-gram properties is presented to obtain the word sequence probability. The proposed method has been tested on both statistical and Artificial Neural Network (ANN) based LMs. In the experiments carried out using, not only words but also sub-word level, two Turkish corpora (METU and Bogazici) shared via Linguistic Data Consortium (LDC) and a separate corpus, which we separate corpus that we specially created as HS was used. According to the experimental results obtained from statistical-based LM, 0.5% WER increases for the METU corpus, 1.6% WER decreases for the Bogazici corpus, and a 2.5% WER decrease for the HS corpus were observed. In the Feedforward Neural Networks (FNN) based LM, WER decreases were observed 0.2% for the METU corpus, 0.8% for the Bogazici corpus, and 1.6% for the HS corpus. Also, in the Recurrent Neural Network (RNN)-Long Short Term Memory (LSTM) based LM, WER decreases were observed 0.6% for METU corpus, 1.1% for the Bogazici corpus and 1.5% for the HS corpus. As a result, when the proposed method was applied to the LMs required for ASR, WER decreased, and the total performance of ASR increased.