The Effect of Different Optimization Techniques on End-to-End Turkish Speech Recognition Systems that use Connectionist Temporal Classification

Arslan R. S., BARIŞÇI N.

2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Kizilcahamam, Türkiye, 19 - 21 Ekim 2018, ss.604-609, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/ismsit.2018.8567240
Basıldığı Şehir: Kizilcahamam
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.604-609
Anahtar Kelimeler: Acoustic Model(AM), Long Short Term Memory(LSTM), Connectionist Temporal Classification(CTC), Recurrent Neural Network(RNN), Optimization Techniques
Gazi Üniversitesi Adresli: Evet

Özet

In the production of acoustic models for speech recognition applications, the use of Long Short Term Memory(LSTM) based Recurrent Neural Network(RNN) has begun to get better results than the use of Gaussian Mixture Model(GMM). The creation of GMM-based acoustic models is prolonging the deep learning process due to the need for aligned Hidden Markov Model(HMM). As a solution to this problem, another method to generate acoustic models is proposed that is based on Connectionist Temporal Classification(CTC). In this study, a CTC based model is created and the effect of different optimization techniques on the classification performance is compared. These tests were applied on Turkish speech datasets to determine the best optimization techniques to be used in speech recognition applications. Our evaluation results showed that GradientDescent, ProximalGradientDescent and RMSPROP produce better results than other algorithms.