Kurdish Dialect Recognition using 1D CNN
Dialect recognition is one of the most attentive topics in the speech analysis area. Machine learning algorithms have been widely used to identify dialects. In this paper, a model that based on three different 1D Convolutional Neural Network (CNN) structures is developed for Kurdish dialect recognition. This model is evaluated, and CNN structures are compared to each other. The result shows that the proposed model has outperformed the state of the art. The model is evaluated on the experimental data that have been collected by the staff of department of computer science at the University of Halabja. Three dialects are involved in the dataset as the Kurdish language consists of three major dialects, namely Northern Kurdish (Badini variant), Central Kurdish (Sorani variant), and Hawrami. The advantage of the CNN model is not required to concern handcraft as the CNN model is featureless. According to the results, the 1 D CNN method can make predictions with an average accuracy of 95.53% on the Kurdish dialect classification. In this study, a new method is proposed to interpret the closeness of the Kurdish dialects by using a confusion matrix and a non-metric multi-dimensional visualization technique. The outcome demonstrates that it is straightforward to cluster given Kurdish dialects and linearly isolated from the neighboring dialects.
Abdul, Z.K., 2019. Kurdish speaker identification based on one dimensional convolutional neural network. Computational Methods for Differential Equations, 7(4), pp.566-572.
Acharya, U.R., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M., Gertych, A. and Tan, R.S., 2017. A deep convolutional neural network model to classify heartbeats. Computers in Biology and Medicine, 89, pp.389-396.
Ali, A., 2018. Multi-Dialect Arabic Broadcast Speech Recognition. [e-book] University of Edinburgh, Edinburgh p.193. Available from: https://www.era.lib.ed.ac.uk/bitstream/handle/1842/31224/Ali2018.pdf?sequence=1 and is Allowed=y [Last accessed on 2020 Dec 12].
Al-Talabani, A., Abdul Z. and Ameen, A., 2017. Kurdish dialects and neighbor languages automatic recognition. The Scientific Journal of Koya University, 5(1), pp.20-23.
Bahari, M.H., Dehak, N., Burget, L., Ali, A.M. and Glass, J., 2014. Non negative factor analysis of gaussian mixture model weight adaptation for language and dialect recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(7), pp.1117-1129.
Chen, N.F., Shen, W. and Campbell, J.P., 2010. A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5014-5017.
Chen, N.F., Shen, W., Campbell, J.P. and Torres-Carrasquillo, P.A., 2011. Informative Dialect Recognition Using Context-dependent Pronunciation Modeling, ICASSP. In: IEEE The international Conference on Acoustics, Speech, and Signal Processing. pp.4396-4399.
Choueiter, G., Zweig, G. and Patrick, N., 2008. An Empirical Study of Automatic Accent Classification. Microsoft Research One Microsoft Way Redmond, WA 98052, pp.4265-4268.
Das, P.P., Allayear, S.M., Amin, R. and Rahman, Z., 2016. Bangladeshi Dialect Recognition Using Mel Frequency Cepstral Coefficient, Delta, Delta-delta and Gaussian Mixture Model. In: Proceeding 8th International Conference on Advanced Computational Intelligence, pp.359-364.
Diakoloukas, V., Digalakis, V., Neumeyer, L. and Kaja, J., 1997. Development of Dialect-specific Speech Recognizers Using Adaptation Methods, ICASSP. IEEE The International Conference on Acoustics, Speech, and Signal Processing, 2, pp.1455-1458.
Haines., Goleman, D., Boyatzis, R., Mckee, A., 2019. The meaning and process of communication. Journal of Chemical Information and Modeling, 53(9), pp.1689-1699.
Hirayama, N., Yoshino, K., Itoyama, K., Mori, S. and Okuno, H.G., 2015. Automatic speech recognition for mixed dialect utterances by mixing dialect language models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(2), pp.373-382.
Huang R. and Hansen, J.H.L., 2007. Unsupervised discriminative training with application to dialect classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 15(8), pp.2444-2453.
Ince, T., Kiranyaz, S., Eren, L. Askar, M. and Gabbouj, M., 2016. Real-time motor fault detection by 1-D convolutional neural networks. IEEE Transactions on Industrial Electronics, 63(11), pp.7067-7075.
Khan, A., Sohail, A., Zahoora, U. and Qureshi, A. S., 2020. A Survey of the Recent Architectures of Deep Convolutional Neural Networks, No. 0123456789. Springer, Netherlands.
Kiranyaz, S., Gastli, A., Ben-Brahim, L., Al-Emadi, N. and Gabbouj, M., 2019. Real-time fault detection and identification for MMC using 1-D convolutional neural networks. IEEE Transactions on Industrial Electronics, 66(11), pp.8760-8771.
Kiranyaz, S., Ince, T. and Gabbouj, M., 2016. Real-time patient-specific ECG classification by 1-D convolutional neural networks. IEEE Transactions on Industrial Electronics, 63(3), pp.664-675.
Kiranyaz, S., Ince, T., Hamila, R. and Gabbouj, M., 2015. Convolutional neural networks for patient-specific ECG classification. Annual International Conference, IEEE Engineering in Medicine and Biology Society, pp.2608-2611.
Masmoudi, A., Bougares, F., Ellouze, M., Estève, Y. and Belguith, L., 2018. Automatic speech recognition system for Tunisian dialect. Language Resources and Evaluation, 52, pp.249-267.
Najafian, M., DeMarco, A., Cox, S. and Russell, M., 2014. Unsupervised model selection for recognition of regional accented speech. Annual Conference of the International Speech Communication Association, INTERSPEECH, pp.2967-2971.
Najafian, M., Hsu, W., Ali, A. and Glass, J., 2017, Automatic speech recognition of Arabic multi-genre broadcast media. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.353-359.
Ying, W., Zhang, L. and Deng, H., 2020, Sichuan dialect speech recognition with deep LSTM network. Frontiers of Computer Science, 14(2), pp.378-387.
Zhang, Q. and. Hansen, J.H.L., 2018. Language/dialect recognition based on unsupervised deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(5), pp.873-882.
Zhang, Q., Ma, Y., Gu, M., Jin, Y., Qi, Z., Ma, X. and Zhou, Q., 2019. End-to-End Chinese Dialects Identification in Short Utterances using CNN-BiGRU. In: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), 2019, pp. 340-344.
Copyright (c) 2021 Karzan J. Ghafoor, Karwan M. Hama Rawf, Ayub O. Abdulrahman, Sarkhel H. Taher
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0] that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).