Kurdish Dialect Recognition using 1D CNN
Dialect recognition is one of the most attentive topics in the speech analysis area. Machine learning algorithms have been widely used to identify dialects. In this paper, a model that based on three different 1D Convolutional Neural Network (CNN) structures is developed for Kurdish dialect recognition. This model is evaluated, and CNN structures are compared to each other. The result shows that the proposed model has outperformed the state of the art. The model is evaluated on the experimental data that have been collected by the staff of department of computer science at the University of Halabja. Three dialects are involved in the dataset as the Kurdish language consists of three major dialects, namely Northern Kurdish (Badini variant), Central Kurdish (Sorani variant), and Hawrami. The advantage of the CNN model is not required to concern handcraft as the CNN model is featureless. According to the results, the 1 D CNN method can make predictions with an average accuracy of 95.53% on the Kurdish dialect classification. In this study, a new method is proposed to interpret the closeness of the Kurdish dialects by using a confusion matrix and a non-metric multi-dimensional visualization technique. The outcome demonstrates that it is straightforward to cluster given Kurdish dialects and linearly isolated from the neighboring dialects.
Abdul, Z.K., 2019. Kurdish speaker identification based on one dimensional convolutional neural network. Computational Methods for Differential Equations, 7(4), pp.566-572.
Acharya, U.R., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M., Gertych, A. and Tan, R.S., 2017. A deep convolutional neural network model to classify heartbeats. Computers in Biology and Medicine, 89, pp.389-396.
Ali, A., 2018. Multi-Dialect Arabic Broadcast Speech Recognition. [e-book] University of Edinburgh, Edinburgh p.193. Available from: https://www.era.lib.ed.ac.uk/bitstream/handle/1842/31224/Ali2018.pdf?sequence=1 and is Allowed=y [Last accessed on 2020 Dec 12].
Al-Talabani, A., Abdul Z. and Ameen, A., 2017. Kurdish dialects and neighbor languages automatic recognition. The Scientific Journal of Koya University, 5(1), pp.20-23.
Bahari, M.H., Dehak, N., Burget, L., Ali, A.M. and Glass, J., 2014. Non negative factor analysis of gaussian mixture model weight adaptation for language and dialect recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(7), pp.1117-1129.
Chen, N.F., Shen, W. and Campbell, J.P., 2010. A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.5014-5017.
Chen, N.F., Shen, W., Campbell, J.P. and Torres-Carrasquillo, P.A., 2011. Informative Dialect Recognition Using Context-dependent Pronunciation Modeling, ICASSP. In: IEEE The international Conference on Acoustics, Speech, and Signal Processing. pp.4396-4399.
Choueiter, G., Zweig, G. and Patrick, N., 2008. An Empirical Study of Automatic Accent Classification. Microsoft Research One Microsoft Way Redmond, WA 98052, pp.4265-4268.
Das, P.P., Allayear, S.M., Amin, R. and Rahman, Z., 2016. Bangladeshi Dialect Recognition Using Mel Frequency Cepstral Coefficient, Delta, Delta-delta and Gaussian Mixture Model. In: Proceeding 8th International Conference on Advanced Computational Intelligence, pp.359-364.
Diakoloukas, V., Digalakis, V., Neumeyer, L. and Kaja, J., 1997. Development of Dialect-specific Speech Recognizers Using Adaptation Methods, ICASSP. IEEE The International Conference on Acoustics, Speech, and Signal Processing, 2, pp.1455-1458.
Haines., Goleman, D., Boyatzis, R., Mckee, A., 2019. The meaning and process of communication. Journal of Chemical Information and Modeling, 53(9), pp.1689-1699.
Hirayama, N., Yoshino, K., Itoyama, K., Mori, S. and Okuno, H.G., 2015. Automatic speech recognition for mixed dialect utterances by mixing dialect language models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(2), pp.373-382.
Huang R. and Hansen, J.H.L., 2007. Unsupervised discriminative training with application to dialect classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 15(8), pp.2444-2453.
Ince, T., Kiranyaz, S., Eren, L. Askar, M. and Gabbouj, M., 2016. Real-time motor fault detection by 1-D convolutional neural networks. IEEE Transactions on Industrial Electronics, 63(11), pp.7067-7075.
Khan, A., Sohail, A., Zahoora, U. and Qureshi, A. S., 2020. A Survey of the Recent Architectures of Deep Convolutional Neural Networks, No. 0123456789. Springer, Netherlands.
Kiranyaz, S., Gastli, A., Ben-Brahim, L., Al-Emadi, N. and Gabbouj, M., 2019. Real-time fault detection and identification for MMC using 1-D convolutional neural networks. IEEE Transactions on Industrial Electronics, 66(11), pp.8760-8771.
Kiranyaz, S., Ince, T. and Gabbouj, M., 2016. Real-time patient-specific ECG classification by 1-D convolutional neural networks. IEEE Transactions on Industrial Electronics, 63(3), pp.664-675.
Kiranyaz, S., Ince, T., Hamila, R. and Gabbouj, M., 2015. Convolutional neural networks for patient-specific ECG classification. Annual International Conference, IEEE Engineering in Medicine and Biology Society, pp.2608-2611.
Masmoudi, A., Bougares, F., Ellouze, M., Estève, Y. and Belguith, L., 2018. Automatic speech recognition system for Tunisian dialect. Language Resources and Evaluation, 52, pp.249-267.
Najafian, M., DeMarco, A., Cox, S. and Russell, M., 2014. Unsupervised model selection for recognition of regional accented speech. Annual Conference of the International Speech Communication Association, INTERSPEECH, pp.2967-2971.
Najafian, M., Hsu, W., Ali, A. and Glass, J., 2017, Automatic speech recognition of Arabic multi-genre broadcast media. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.353-359.
Ying, W., Zhang, L. and Deng, H., 2020, Sichuan dialect speech recognition with deep LSTM network. Frontiers of Computer Science, 14(2), pp.378-387.
Zhang, Q. and. Hansen, J.H.L., 2018. Language/dialect recognition based on unsupervised deep learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(5), pp.873-882.
Zhang, Q., Ma, Y., Gu, M., Jin, Y., Qi, Z., Ma, X. and Zhou, Q., 2019. End-to-End Chinese Dialects Identification in Short Utterances using CNN-BiGRU. In: 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), 2019, pp. 340-344.
Copyright (c) 2021 Karzan J. Ghafoor, Karwan M. Hama Rawf, Ayub O. Abdulrahman, Sarkhel H. Taher
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who choose to publish their work with Aro agree to the following terms:
Authors retain the copyright to their work and grant the journal the right of first publication. The work is simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0]. This license allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors have the freedom to enter into separate agreements for the non-exclusive distribution of the journal's published version of the work. This includes options such as posting it to an institutional repository or publishing it in a book, as long as proper acknowledgement is given to its initial publication in this journal.
Authors are encouraged to share and post their work online, including in institutional repositories or on their personal websites, both prior to and during the submission process. This practice can lead to productive exchanges and increase the visibility and citation of the published work.
By agreeing to these terms, authors acknowledge the importance of open access and the benefits it brings to the scholarly community.