Turkish sign language recognition based on multistream data fusion

Gunduz, HÜSEYİN; POLAT, HÜSEYİN

doi:10.3906/elk-2005-156

Turkish sign language recognition based on multistream data fusion

Atıf İçin Kopyala

Gunduz C., POLAT H.

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.29, sa.2, ss.1171-1186, 2021 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 29 Sayı: 2
Basım Tarihi: 2021
Doi Numarası: 10.3906/elk-2005-156
Dergi Adı: TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.1171-1186
Anahtar Kelimeler: Deep learning, sign language&nbsp, recognition&nbsp, 3D convolutional&nbsp, neural&nbsp, networks&nbsp, long-short-term memory, recurrent&nbsp, neural&nbsp, networks
Gazi Üniversitesi Adresli: Evet

Özet

Sign languages are nonverbal, visual languages that hearing- or speech-impaired people use for communication. Aside from hands, other communication channels such as body posture and facial expressions are also valuable in sign languages. As a result of the fact that the gestures in sign languages vary across countries, the significance of communication channels in each sign language also differs. In this study, representing the communication channels used in Turkish sign language, a total of 8 different data streams-4 RGB, 3 pose, 1 optical flow-were analyzed. Inception 3D was used for RGB and optical flow; and LSTM-RNN was used for pose data streams. Experiments were conducted by merging the data streams in different combinations, and then a sign language recognition system that merged the most suitable streams with the help of a multistream late fusion mechanism was proposed. Considering each data stream individually, the accuracies of the RGB streams were between 28% and 79%; pose stream accuracies were between 9% and 50%; and optical flow data accuracy was 78.5%. When these data streams were used in combination, the sign language recognition performance was higher in comparison to any of the data streams alone. The proposed sign language recognition system uses a multistream data fusion mechanism and gives an accuracy of 89.3% on BosphorusSign General dataset. The multistream data fusion mechanisms have a great potential for improving sign language recognition results.