The Effect of Different Optimization Techniques on End-to-End Turkish Speech Recognition Systems that use Connectionist Temporal Classification


Arslan R. S., BARIŞÇI N.

2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Kizilcahamam, Türkiye, 19 - 21 Ekim 2018, ss.604-609 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası:
  • Doi Numarası: 10.1109/ismsit.2018.8567240
  • Basıldığı Şehir: Kizilcahamam
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.604-609
  • Anahtar Kelimeler: Acoustic Model(AM), Long Short Term Memory(LSTM), Connectionist Temporal Classification(CTC), Recurrent Neural Network(RNN), Optimization Techniques
  • Gazi Üniversitesi Adresli: Evet

Özet

In the production of acoustic models for speech recognition applications, the use of Long Short Term Memory(LSTM) based Recurrent Neural Network(RNN) has begun to get better results than the use of Gaussian Mixture Model(GMM). The creation of GMM-based acoustic models is prolonging the deep learning process due to the need for aligned Hidden Markov Model(HMM). As a solution to this problem, another method to generate acoustic models is proposed that is based on Connectionist Temporal Classification(CTC). In this study, a CTC based model is created and the effect of different optimization techniques on the classification performance is compared. These tests were applied on Turkish speech datasets to determine the best optimization techniques to be used in speech recognition applications. Our evaluation results showed that GradientDescent, ProximalGradientDescent and RMSPROP produce better results than other algorithms.