Bridging the Gap

Enhancing Kurdish News Classification with RFA-CNN Hybrid Model

  • Soran S. Badawi Charmo Center for Scientific Research and Consulting – Language and Linguistic Center, Charmo University Chamchamal, Sulaimani, Kurdistan region - F.R. Iraq https://orcid.org/0000-0001-9117-3078
Keywords: News Classification, Kurdish Language, Red fox optimization-Convolutional neural network, Bidirectional long short-term memory, Bidirectional encoder representations from transformers

Abstract

Effective organization and retrieval of news content are heavily reliant on accurate news classification. While the mountainous research has been conducted in resourceful languages like English and Chinese, the researches on under-resourced languages like the Kurdish language are severely lacking. To address this challenge, we introduce a hybrid approach called RFO-CNN in this paper. The proposed method combines an improved version of red fox optimization algorithm (RFO) and convolutional neural network (CNN) for finetuning CNN’s parameters. Our model’s efficacy was tested on two widely used Kurdish news datasets, KNDH and KDC-4007, both of which contain news articles classified into various categories. We compared the performance of RFO-CNN to other cutting-edge deep learning models such as bidirectional long short-term memory networks and bidirectional encoder representations from transformers (BERT) transformers, as well as classical machine learning approaches such as multinomial naive bayes, support vector machine, and K-nearest neighbors. We trained and tested our datasets using four different scenarios: 60:40, 70:30, 80:20, and 90:10. Our experimental results demonstrate the superiority of the RFO-CNN model across all scenarios, outperforming the benchmark BERT model and other machine learning models in terms of accuracy and F1-score.

Downloads

Download data is not yet available.

Author Biography

Soran S. Badawi, Charmo Center for Scientific Research and Consulting – Language and Linguistic Center, Charmo University Chamchamal, Sulaimani, Kurdistan region - F.R. Iraq

Soran S. Badawi is an Assistant Lecturer at the Language Center, Charmo Researcher Center for research, training and consultancy, Charmo University. He got the B.Sc. degree in English Language and Literature at University of Sulaimani, Iraq, and the M.Sc. degree in Computational Linguistics from Isfahan University, Iran. His research interests are in Natural Language Processing (NLP), Machine Translation and Sentiment Analysis.

References

Ahmadi, S., 2020. KLPT-Kurdish Language Processing Toolkit. In Proceedings of the Second Workshop for NLP Open Source Software (NLP-OSS), pp.72-84. DOI: https://doi.org/10.18653/v1/2020.nlposs-1.11

Al-Tahrawi, M.M., 2015. Arabic text categorization using logistic regression. International Journal of Intelligent Systems and Applications, 7(6), pp.71-78. DOI: https://doi.org/10.5815/ijisa.2015.06.08

Azad, R., Mohammed, B., Mahmud, R., Zrar, L., and Sdiqa, S.J., 2021. Fake news detection in low resourced languages ”Kurdish language” using machine learning algorithms. Journal of Computational Science Education, 12(6), pp.4219-4225.

Badawi, S., 2023. Data augmentation for Sorani Kurdish news headline classification using back-translation and deep learning model. Kurdistan Journal of Applied Research, 8(1), pp.27-34. DOI: https://doi.org/10.24017/science/2023.1.4

Badawi, S., 2024. Deep learning-based cyberbullying detection in Kurdish language. The Computer Journal, p.bxae024. DOI: https://doi.org/10.1093/comjnl/bxae024

Badawi, S., Saeed, A.M., Ahmed, S.A., Abdalla, P.A., and Hassan, D.A., 2023. Kurdish News Dataset Headlines (KNDH) through multiclass classification. Data in Brief, 48, p.109120. DOI: https://doi.org/10.1016/j.dib.2023.109120

Badawi, S.S., 2023. Using multilingual bidirectional encoder representations from transformers on medical corpus for Kurdish text classification. ARO-The Scientific Journal of Koya University, 11(1), pp.10-15. DOI: https://doi.org/10.14500/aro.11088

Bouras, C., and Tsogkas, V., 2009. Personalization Mechanism for Delivering News Articles on the User’s Desktop. In: 2009 Fourth International Conference on Internet and Web Applications and Services, pp.157-162. DOI: https://doi.org/10.1109/ICIW.2009.30

Chen, X., Cong, P., and Lv, S., 2022. A Long-text classification method of Chinese news based on BERT and CNN. IEEE Access, 10, pp.34046-34057. DOI: https://doi.org/10.1109/ACCESS.2022.3162614

Cleger-Tamayo, S., Fernandez-Luna, J.M., and Huete, J.F., 2012. Top-N news recommendations in digital newspapers. Knowledge-Based Systems, 27, pp.180-189. DOI: https://doi.org/10.1016/j.knosys.2011.11.017

Dai, Y., and Wang, T., 2021. Prediction of customer engagement behaviour response to marketing posts based on machine learning. Connection Science, 33(4), pp.891-910. DOI: https://doi.org/10.1080/09540091.2021.1912710

Garrido, A.L., Gomez, O., Ilarri, S., and Mena, E., 2011. NASS: News Annotation Semantic System. IN: 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence, pp.904-905. DOI: https://doi.org/10.1109/ICTAI.2011.149

Jing, W., and Bailong, Y., 2021. News Text Classification and Recommendation Technology Basedon Wide and Deep-Bert Model. In: 2021 IEEE International Conference on Information Communication and Software Engineering (ICICSE), pp.209-216. DOI: https://doi.org/10.1109/ICICSE52190.2021.9404101

Jugovac, M., Jannach, D., and Karimi, M., 2018. Streamingrec. In: Proceedings of the 12th ACM Conference on Recommender Systems, pp.269-273. DOI: https://doi.org/10.1145/3240323.3240384

Kaliyar, R.K., Goswami, A., and Narang, P., 2021. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimedia Tools and Applications, 80(8), pp.11765-11788. DOI: https://doi.org/10.1007/s11042-020-10183-2

Khorami, E., Mahdi Babaei, F., and Azadeh, A., 2021. Optimal diagnosis of COVID-19 based on convolutional neural network and red fox optimization algorithm. Computational Intelligence and Neuroscience, 2021, p.4454507. DOI: https://doi.org/10.1155/2021/4454507

Liu, J., Xia, C., Yan, H., Xie, Z., and Sun, J., 2019. Hierarchical Comprehensive Context Modeling for Chinese Text Classification. IEEE Access, 7, pp.154546-154559. DOI: https://doi.org/10.1109/ACCESS.2019.2949175

Mahesh, P.C.S., and Hemalatha, S., 2022. An efficient android malware detection using adaptive red fox optimization based CNN. Wireless Personal Communications, 126(1), pp.679-700. DOI: https://doi.org/10.1007/s11277-022-09765-0

Połap, D., and Wozniak, M., 2021. Red fox optimization algorithm. Expert Systems with Applications, 166, p.114107. DOI: https://doi.org/10.1016/j.eswa.2020.114107

Pugal Priya, R., Saradadevi Sivarani, T., and Gnana Saravanan, A., 2022. Deep long and short term memory based Red Fox optimization algorithm for diabetic retinopathy detection and classification. International Journal for Numerical Methods in Biomedical Engineering, 38(3), p.e3560. DOI: https://doi.org/10.1002/cnm.3560

Rashid, T.A., Mustafa, A.M., and Saeed, A.M., 2017. Automatic Kurdish Text Classification Using KDC 4007 Dataset. In: International Conference on Emerging Intelligent Data and Web Technologies. DOI: https://doi.org/10.1007/978-3-319-59463-7_19

Reddy, S., Nalluri, S., Kunisetti, S., Ashok, S., and Venkatesh, B., 2019. Content Based Movie Recommendation System Using Genre Correlation. Springer, Singapore, pp.391-397. DOI: https://doi.org/10.1007/978-981-13-1927-3_42

Saeed, A.M., Badawi, S., Ahmed, S.A., and Hassan, D.A., 2023. Comparison of feature selection methods in Kurdish text classification. Iran Journal of Computer Science, 7, pp.55-64. DOI: https://doi.org/10.1007/s42044-023-00159-4

Salh, D.A., and Nabi, R.M., 2023. Kurdish fake news detection based on machine learning approaches. Passer Journal of Basic and Applied Sciences, 5(2), pp.262-271. DOI: https://doi.org/10.24271/psr.2023.380132.1226

Tan, Y., 2018. An Improved KNN Text Classification Algorithm Based on K-Medoids and Rough Set. In: 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), pp.109-113. DOI: https://doi.org/10.1109/IHMSC.2018.00032

Verma, P.K., Agrawal, P., Amorim, I., and Prodan, R., 2021. WELFake: Word embedding over linguistic features for fake news detection. IEEE Transactions on Computational Social Systems,8(4), pp.881-893. DOI: https://doi.org/10.1109/TCSS.2021.3068519

Xie, J., Chen, B., Gu, X., Liang, F., and Xu, X., 2019. Self-attention-based BiLSTM model for short text fine-grained sentiment classification. IEEE Access, 7, pp.180558-180570. DOI: https://doi.org/10.1109/ACCESS.2019.2957510

Zhang, C., Gupta, A., Kauten, C., Deokar, A.V., and Qin, X.J., 2019. Detecting fake news for reducing misinformation risks using analytics approaches. European Journal of Operational Research, 279(3), pp.1036-1052. DOI: https://doi.org/10.1016/j.ejor.2019.06.022

Zhang, Y., Xu, B., and Zhao, T., 2020. Convolutional multi-head self-attention on memory for aspect sentiment classification. IEEE/CAA Journal of Automatica Sinica, 7(4), pp.1038-1044. DOI: https://doi.org/10.1109/JAS.2020.1003243

Zhu, Y., 2021. Research on news text classification based on deep learning convolutional neural network. Wireless Communications and Mobile Computing, 2021, p.1508150 DOI: https://doi.org/10.1155/2021/1508150

Published
2024-04-03
How to Cite
Badawi, S. S. (2024) “Bridging the Gap: Enhancing Kurdish News Classification with RFA-CNN Hybrid Model”, ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 12(1), pp. 100-107. doi: 10.14500/aro.11519.