Enhanced Category-Feature Association Measure
A Robust Approach for Text Classification through Feature Selection
DOI:
https://doi.org/10.14500/aro.12034Keywords:
Dimension reduction, Feature selection, Long short-term memory, Multinomial Naive Bayes, Support vector machines, Text classificationAbstract
Text classification is one of the severe challenges for categorizing large and high-dimensional text data accurately and efficiently. Many features confuse the classification process, and feature selection (FS) strategies should be used to deal with the problem of high dimensionality. This paper proposes a novel FS technique based on enhanced category-feature association measure (ECFAM). ECFAM utilizes the existence and elimination of terms and the complicated relationships among the terms across different sections. This one-of-a-kind approach emphasizes the key role of ancillary terms in classifying and differentiating categories. The comparison is done on two important datasets, Reuters-21578 and 20-Newsgroups, through two widely employed supervised machine learning classifiers and one deep learning algorithm. Throughout our experiments, we investigate the feature sizes in nine different feature sets, ranging from 50 to 4000. Experimental data show that ECFAM always performs better than other methods concerning accuracy and computational cost.
Downloads
References
Abbas, M., Ali Memon, K., Jamali, A.A., Memon, S., and Ahmed, A., 2019. Multinomial naive Bayes classification model for sentiment analysis. IJCSNS International Journal of Computer Science and Network Security, 19(3), p.62.
Adi, A.O., and Celebi, E., 2014. Classification of 20 news group with Naive Bayes classifier. In: 2014 22nd Signal Processing and Communications Applications Conference (SIU). IEEE, United States, pp.2150-2153.
Alyasiri, O.M., Cheah, Y.N., and Abasi, A.K., 2021. Hybrid filter-wrapper text feature selection technique for text classification. In: 2021 International Conference on Communication and Information Technology (ICICT). IEEE, United States, pp.80-86.
Badawi, S.S., 2023. Using multilingual bidirectional encoder representations from transformers on medical corpus for Kurdish text classification. ARO-the Scientific Journal of Koya University, 11(1), pp.10-15.
Bhavani, A., and Santhosh Kumar, B., 2021. A review of state art of text classification algorithms. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC). IEEE, United States, pp.1484-1490.
Deng, X., Li, Y., Weng, J., Zhang, J., 2019. Feature selection for text classification: A review. Multimedia Tools and Application, 78, pp.3797-3816.
Dhal, P., and Azad, C., 2022. A comprehensive survey on feature selection in the various fields of machine learning. Applied Intelligence, 52(4), pp.4543-4581.
Dou, G., Zhao, K., Guo, M., and Mou, J., 2023. Memristor-based LSTM network for text classification. Fractals, 31(06), p.2340040. Erenel, Z., Adegboye, O.R., and Kusetogullari, H., 2020. A new feature selection scheme for emotion recognition from text. Applied Sciences, 10(15), p.5351.
Ige, O.P., and Gan, K.H., 2024. Ensemble filter-wrapper text feature selection methods for text classification. CMES-Computer Modeling in Engineering and Sciences, 141(2), pp.1847-1865.
Gudakahriz, S.J., Moghadam, A.M.E., and Mahmoudi, F., 2021. Opinion texts clustering using manifold learning based on sentiment and semantics analysis. Scientific Programming, 2021, p.7842631.
Jain, D., and Singh, V., 2018. Feature selection and classification systems for chronic disease prediction: A review. Egyptian Informatics Journal, 19(3), pp.179-189.
Jamshidi, S., Mohammadi, M., Bagheri, S., Najafabadi, H.E., Rezvanian, A., Gheisari, M., Ghaderzadeh, M., Shahabi, A.S., and Wu, Z. 2024. Effective text classification using BERT, MTM LSTM, and DT. Data and Knowledge Engineering, 151, p.102306.
Kim, K., and Zzang, S.Y., 2019. Trigonometric comparison measure: A feature selection method for text categorization. Data and Knowledge Engineering, 119, pp.1-21.
López-González, J.L., Franco-Villafañe, J.A., Méndez-Sánchez, R.A., Zavala-Vivar, G., Flores-Olmedo, E., Arreola-Lucas, A., and Báez, G., 2021. Deviations from poisson statistics in the spectra of free rectangular thin plates. Physical Review E, 103(4), p.043004.
Lyu, Y., Feng, Y., and Sakurai, K., 2023. A survey on feature selection techniques based on filtering methods for cyber-attack detection. Information, 14(3), p.191.
Mamdouh Farghaly, H., and Abd El-Hafeez, T., 2023. A high-quality feature selection method based on frequent and correlated items for text classification. Soft Computing, 27(16), pp.11259-11274.
Miao, Y., Wang, J., Zhang, B., and Li, H., 2022. Practical framework of gini index in the application of machinery fault feature extraction. Mechanical Systems and Signal Processing, 165, p.108333.
Mirończuk, M.M., and Protasiewicz, J., 2018. A recent overview of the state-of-the-art elements of text classification. Expert Systems with Applications, 106, pp.36-54.
Murshed, B.A.H., Abawajy, J., Mallappa, S., Saif, M.A.N., and Al-Ariki, H.D.A., 2022. DEA-RNN: A hybrid deep learning approach for cyberbullying detection in twitter social media platform. IEEE Access, 10, pp.25857-258571.
Noroozi, Z., Orooji, A., and Erfannia, L., 2023. Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction. Scientific Reports, 13(1), p.22588.
Omuya, E.O., Okeyo, G.O., and Kimwele, M.W., 2021. Feature selection for classification using principal component analysis and information gain. Expert Systems with Applications, 174, p.114765.
Palanivinayagam, A., El-Bayeh, C.Z., and Damaševičius, R., 2023. Twenty years of machine-learning-based text classification: A systematic review. Algorithms, 16(5), p.236.
Parlak, B., and Uysal, A.K., 2023. A novel filter feature selection method for text classification: Extensive feature selector. Journal of Information Science, 49(1), pp.59-78.
Pudjihartono, N., Fadason, T., Kempa-Liehr, A.W., and O’Sullivan, J.M., 2022. A review of feature selection methods for machine learning-based disease risk prediction. Frontiers in Bioinformatics, 2, p.927312.
Russell-Rose, T., Stevenson, M., and Whitehead, M. 2002. The Reuters Corpus Volume 1-from Yesterday’s News to Tomorrow’s Language Resources. European Language Resources Association (ELRA), Las Palmas.
Saeed, A.M., Badawi, S., Ahmed, S.A., and Hassan, D.A., 2023. Comparison of feature selection methods in Kurdish text classification. Iran Journal of Computer Science, 7, pp.55-64.
Saeed, A.M., Ismael, A.N., Rasul, D.L., Majeed, R.S., and Rashid, T.A., 2022. Hate Speech Detection in Social Media for the Kurdish Language. Springer, Cham, pp.253-260.
Uysal, A.K., and Gunal, S., 2012. A novel probabilistic feature selection method for text classification. Knowledge-Based Systems, 36, pp.226-235.
Zhang, J., Hu, X., Li, P., He, W., Zhang, Y., and Li, H., 2014. A hybrid feature selection approach by correlation-based filters and SVM-RFE. In: 2014 22nd International Conference on Pattern Recognition. IEEE, United States, pp.3684-3689.
Zhou, H., Wang, X., and Zhu, R., 2022. Feature selection based on mutual information with correlation coefficient. Applied Intelligence, 52(5), pp.5457-5474.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Soran S. Badawi, Ari M. Saeed, Sara A. Ahmed, Diyari A. Hassan

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who choose to publish their work with Aro agree to the following terms:
-
Authors retain the copyright to their work and grant the journal the right of first publication. The work is simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0]. This license allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors have the freedom to enter into separate agreements for the non-exclusive distribution of the journal's published version of the work. This includes options such as posting it to an institutional repository or publishing it in a book, as long as proper acknowledgement is given to its initial publication in this journal.
-
Authors are encouraged to share and post their work online, including in institutional repositories or on their personal websites, both prior to and during the submission process. This practice can lead to productive exchanges and increase the visibility and citation of the published work.
By agreeing to these terms, authors acknowledge the importance of open access and the benefits it brings to the scholarly community.
Accepted 2025-07-29
Published 2025-08-21