Enhanced Category-Feature Association Measure

A Robust Approach for Text Classification through Feature Selection

Authors

DOI:

https://doi.org/10.14500/aro.12034

Keywords:

Dimension reduction, Feature selection, Long short-term memory, Multinomial Naive Bayes, Support vector machines, Text classification

Abstract

Text classification is one of the severe challenges for categorizing large and high-dimensional text data accurately and efficiently. Many features confuse the classification process, and feature selection (FS) strategies should be used to deal with the problem of high dimensionality. This paper proposes a novel FS technique based on enhanced category-feature association measure (ECFAM). ECFAM utilizes the existence and elimination of terms and the complicated relationships among the terms across different sections. This one-of-a-kind approach emphasizes the key role of ancillary terms in classifying and differentiating categories. The comparison is done on two important datasets, Reuters-21578 and 20-Newsgroups, through two widely employed supervised machine learning classifiers and one deep learning algorithm. Throughout our experiments, we investigate the feature sizes in nine different feature sets, ranging from 50 to 4000. Experimental data show that ECFAM always performs better than other methods concerning accuracy and computational cost.

Downloads

Download data is not yet available.

Author Biographies

Soran S. Badawi, Language Center, Charmo University, Chamchamal, Kurdistan Region – F.R. Iraq

Soran S. Badawi is a Lecturer at the Language Center, Charmo Researcher Center for research, training, and consultancy, Charmo University. He got the B.Sc. degree in English Language and Literature at the University of Sulaimani, Iraq, and the M.Sc. degree in Computational Linguistics from Isfahan University, Iran. His research interests are in natural language processing (NLP), machine translation, and sentiment analysis.

Ari M. Saeed, Department of Computer Science, University of Halabja, Halabja, Kurdistan Region – F.R. Iraq

Ari M. Saeed is an Assistant Professor at the Department of Computer, College of Science, University of Halabja. He got the B.Sc. degree in computer science and the M.Sc. degree in computer engineering. His research interests are in machine learning, natural language processing (NLP), and text classification.

Sara A. Ahmed, Department of Computer Engineering, Komar University of Science and Technology, Sulaimaniyah, Kurdistan Region – F.R. Iraq

Sara A. Ahmed is a Lecturer at the Department of Computer Engineering, Faculty of Engineering, Komar University of Science and Technology . She got the B.Sc. degree in Computer Science, the M.Sc. degree in Computer Systems Engineering. Her research interests are in text classification, robotics and artificial intelligence.

Diyari A. Hassan, Department of Biomedical Engineering, Faculty of Engineering and Computer Science, Qaiwan International University, Sulaimaniyah, Kurdistan Region – F.R. Iraq

Diyari A. Hassan is an Assistant Professor at the Department of Biomedical Engineering, Faculty of Engineering and Computer Science, Qaiwan International University. He got the B.Sc. degree in Telecommunication, the M.Sc. degree in Electrical and Electronic Engineering and the Ph.D. degree in Computer Engineering. His research interests are in signal processing, polynomial matrix decomposition and artificial intelligence.

References

Abbas, M., Ali Memon, K., Jamali, A.A., Memon, S., and Ahmed, A., 2019. Multinomial naive Bayes classification model for sentiment analysis. IJCSNS International Journal of Computer Science and Network Security, 19(3), p.62.

Adi, A.O., and Celebi, E., 2014. Classification of 20 news group with Naive Bayes classifier. In: 2014 22nd Signal Processing and Communications Applications Conference (SIU). IEEE, United States, pp.2150-2153.

Alyasiri, O.M., Cheah, Y.N., and Abasi, A.K., 2021. Hybrid filter-wrapper text feature selection technique for text classification. In: 2021 International Conference on Communication and Information Technology (ICICT). IEEE, United States, pp.80-86.

Badawi, S.S., 2023. Using multilingual bidirectional encoder representations from transformers on medical corpus for Kurdish text classification. ARO-the Scientific Journal of Koya University, 11(1), pp.10-15.

Bhavani, A., and Santhosh Kumar, B., 2021. A review of state art of text classification algorithms. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC). IEEE, United States, pp.1484-1490.

Deng, X., Li, Y., Weng, J., Zhang, J., 2019. Feature selection for text classification: A review. Multimedia Tools and Application, 78, pp.3797-3816.

Dhal, P., and Azad, C., 2022. A comprehensive survey on feature selection in the various fields of machine learning. Applied Intelligence, 52(4), pp.4543-4581.

Dou, G., Zhao, K., Guo, M., and Mou, J., 2023. Memristor-based LSTM network for text classification. Fractals, 31(06), p.2340040. Erenel, Z., Adegboye, O.R., and Kusetogullari, H., 2020. A new feature selection scheme for emotion recognition from text. Applied Sciences, 10(15), p.5351.

Ige, O.P., and Gan, K.H., 2024. Ensemble filter-wrapper text feature selection methods for text classification. CMES-Computer Modeling in Engineering and Sciences, 141(2), pp.1847-1865.

Gudakahriz, S.J., Moghadam, A.M.E., and Mahmoudi, F., 2021. Opinion texts clustering using manifold learning based on sentiment and semantics analysis. Scientific Programming, 2021, p.7842631.

Jain, D., and Singh, V., 2018. Feature selection and classification systems for chronic disease prediction: A review. Egyptian Informatics Journal, 19(3), pp.179-189.

Jamshidi, S., Mohammadi, M., Bagheri, S., Najafabadi, H.E., Rezvanian, A., Gheisari, M., Ghaderzadeh, M., Shahabi, A.S., and Wu, Z. 2024. Effective text classification using BERT, MTM LSTM, and DT. Data and Knowledge Engineering, 151, p.102306.

Kim, K., and Zzang, S.Y., 2019. Trigonometric comparison measure: A feature selection method for text categorization. Data and Knowledge Engineering, 119, pp.1-21.

López-González, J.L., Franco-Villafañe, J.A., Méndez-Sánchez, R.A., Zavala-Vivar, G., Flores-Olmedo, E., Arreola-Lucas, A., and Báez, G., 2021. Deviations from poisson statistics in the spectra of free rectangular thin plates. Physical Review E, 103(4), p.043004.

Lyu, Y., Feng, Y., and Sakurai, K., 2023. A survey on feature selection techniques based on filtering methods for cyber-attack detection. Information, 14(3), p.191.

Mamdouh Farghaly, H., and Abd El-Hafeez, T., 2023. A high-quality feature selection method based on frequent and correlated items for text classification. Soft Computing, 27(16), pp.11259-11274.

Miao, Y., Wang, J., Zhang, B., and Li, H., 2022. Practical framework of gini index in the application of machinery fault feature extraction. Mechanical Systems and Signal Processing, 165, p.108333.

Mirończuk, M.M., and Protasiewicz, J., 2018. A recent overview of the state-of-the-art elements of text classification. Expert Systems with Applications, 106, pp.36-54.

Murshed, B.A.H., Abawajy, J., Mallappa, S., Saif, M.A.N., and Al-Ariki, H.D.A., 2022. DEA-RNN: A hybrid deep learning approach for cyberbullying detection in twitter social media platform. IEEE Access, 10, pp.25857-258571.

Noroozi, Z., Orooji, A., and Erfannia, L., 2023. Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction. Scientific Reports, 13(1), p.22588.

Omuya, E.O., Okeyo, G.O., and Kimwele, M.W., 2021. Feature selection for classification using principal component analysis and information gain. Expert Systems with Applications, 174, p.114765.

Palanivinayagam, A., El-Bayeh, C.Z., and Damaševičius, R., 2023. Twenty years of machine-learning-based text classification: A systematic review. Algorithms, 16(5), p.236.

Parlak, B., and Uysal, A.K., 2023. A novel filter feature selection method for text classification: Extensive feature selector. Journal of Information Science, 49(1), pp.59-78.

Pudjihartono, N., Fadason, T., Kempa-Liehr, A.W., and O’Sullivan, J.M., 2022. A review of feature selection methods for machine learning-based disease risk prediction. Frontiers in Bioinformatics, 2, p.927312.

Russell-Rose, T., Stevenson, M., and Whitehead, M. 2002. The Reuters Corpus Volume 1-from Yesterday’s News to Tomorrow’s Language Resources. European Language Resources Association (ELRA), Las Palmas.

Saeed, A.M., Badawi, S., Ahmed, S.A., and Hassan, D.A., 2023. Comparison of feature selection methods in Kurdish text classification. Iran Journal of Computer Science, 7, pp.55-64.

Saeed, A.M., Ismael, A.N., Rasul, D.L., Majeed, R.S., and Rashid, T.A., 2022. Hate Speech Detection in Social Media for the Kurdish Language. Springer, Cham, pp.253-260.

Uysal, A.K., and Gunal, S., 2012. A novel probabilistic feature selection method for text classification. Knowledge-Based Systems, 36, pp.226-235.

Zhang, J., Hu, X., Li, P., He, W., Zhang, Y., and Li, H., 2014. A hybrid feature selection approach by correlation-based filters and SVM-RFE. In: 2014 22nd International Conference on Pattern Recognition. IEEE, United States, pp.3684-3689.

Zhou, H., Wang, X., and Zhu, R., 2022. Feature selection based on mutual information with correlation coefficient. Applied Intelligence, 52(5), pp.5457-5474.

Published

2025-08-21

How to Cite

Badawi, S. S. (2025) “Enhanced Category-Feature Association Measure: A Robust Approach for Text Classification through Feature Selection”, ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 13(2), pp. 114–123. doi: 10.14500/aro.12034.
Received 2025-02-02
Accepted 2025-07-29
Published 2025-08-21

Similar Articles

<< < 14 15 16 17 18 19 20 21 22 23 > >> 

You may also start an advanced similarity search for this article.