Improving Breast Cancer Classification with Adaptive Synthetic Sampling, Feature Selection, and Hyperparameter Optimization

Authors

DOI:

https://doi.org/10.14500/aro.12386

Keywords:

Adaptive Synthetic Sampling, Arctic Puffin Optimisation, Breast Cancer Detection, Feature Selection, Hyperparameter Optimization, Machine Learning

Abstract

Breast cancer is a major global health concern, highlighting the need for accurate and efficient diagnostic solutions rather than persistent issues with detection accuracy. This study presents an enhanced machine learning framework to improve breast cancer classification by addressing key limitations: Class imbalance, irrelevant features, and suboptimal hyperparameters. Adaptive synthetic sampling (ADASYN) was used to balance class distribution and various feature selection techniques. Univariate Selection and recursive feature elimination improved feature relevance, and arctic puffin optimization (APO) was applied for hyperparameter tuning. Multiple classifiers were evaluated using the Wisconsin Diagnostic Breast Cancer dataset. The random forest (RF) with ADASYN approach, optimized using APO, achieved outstanding results – 99.53% accuracy, 100% precision, 99.07% recall, and 99.53% F1-score – with only one misclassification out of 569 samples. This framework, while not modifying ADASYN or RF algorithms themselves, significantly enhances diagnostic performance and serves as a robust foundation for clinical decision support systems.

Downloads

Download data is not yet available.

Author Biographies

Hayder N. Jasim, College of Computer Science and Information Technology, University of Anbar, Anbar, Iraq

Hayder Nsaif Jasim is an M.Sc. student in Computer Science at the College of Computer Science and Information Technology, University of Anbar. He received the B.Sc. degree in Computer Science from the University of Anbar, Ramadi, Iraq. His research interests are AI, deep learning, and metaheuristic optimisation.

Wesam M. Jasim, Collage of Computer Science and Information Technology, University of Anbar, Ramadi, Iraq

Wesam M. Jasim . is a professor in the Computer Science Department at the University of Anbar, Iraq. He received his Ph.D. in Electrical and Electronic Engineering from the University of Essex, UK, in 2016. His research interests include robot control, system identification, deep learning, and metaheuristics.

Mohammed S. Ibrahim, College of Computer Science and Information Technology, University of Anbar, Anbar, Iraq

Mohammed S. Ibrahim is a lecturer in the Computer Science Department at the University of Anbar, Iraq. He received his Ph.D. in Artificial Intelligence from the University of Arkansas, USA, in 2021. His research interests include NLP, information retrieval, and machine learning.

References

Ahamed, M.R.H., 2024. Early detection: Machine learning for breast cancer prediction. In: Conference: Early Detection: Machine Learning for Breast Cancer PredictionAt: Galle, Sri Lanka.

Aiyeniko, O., 2023. Performance evaluation of metaheuristic algorithms for feature selection in breast cancer predictive model. American Journal of Computer Sciences and Applications, 11(2), pp.1-10.

Allam, M., and Nandhini, M., 2022. Optimal feature selection using binary teaching learning based optimization algorithm. Journal of King Saud University - Computer and Information Sciences, 34, pp.329-341.

Assegie, T.A., Salau, A.O., Sampath, K., Govindarajan, R., Murugan, S., and Lakshmi, B., 2024. Evaluation of adaptive synthetic resampling technique for imbalanced breast cancer identification. Procedia Computer Science, 235, pp.1000-1007.

Breiman, L., 2001. Random forests. Machine Learning, 45, pp.5-32. Cortes, C., Vapnik, V., 1995. Support-vector networks. Machine Learning, 20, pp.273-297.

Cover, T.M., and Hart, P.E., 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), pp.21-27.

Cox, D.R., 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20, pp.215-232.

Gárate-Escamila, A.K., Hajjam El Hassani, A., and Andrès, E., 2020. Classification models for heart disease prediction using feature selection and PCA. Informatics in Medicine Unlocked, 19, 100330.

Gopal, V.N., Al-Turjman, F., Kumar, R., Anand, L., and Rajesh, M., 2021. Feature selection and classification in breast cancer prediction using IoT and machine learning. Measurement, 178, p.109442.

Gupta, P., and Garg, S., 2020. Breast cancer prediction using varying parameters of machine learning models. Procedia Computer Science, 171, pp.593-601.

He, H., Bai, Y., Garcia, E.A., and Li, S., 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, IEEE. pp.1322-1328.

Krawczyk, B., 2016. Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence, 5, pp.221-232.

Kulkarni, A., Chong, D., and Batarseh, F.A., 2020. Foundations of data imbalance and solutions for a data democracy. In: Data Democracy: At the Nexus of Artificial Intelligence, Software Development, and Knowledge Engineering, Academic Press, United States, pp.83-106.

Lukong, K.E., 2017. Understanding breast cancer – The long and winding road. BBA Clinical, 7, pp.64-77.

Magboo, V.P.C., and Magboo, M.S., 2021. Machine learning classifiers on breast cancer recurrences. Procedia Computer Science, 192, pp.2742-2752.

Mangasarian, O.L., Street, W.N., and Wolberg, W.H., 1994. Breast cancer diagnosis and prognosis via linear programming. AAAI Spring Symposium – Technical Report, SS-94-01, pp.83-86.

Minnoor, M., and Baths, V., 2022. Diagnosis of breast cancer using random forests. Procedia Computer Science, 218, pp.429-437.

Mowri, R.A., Siddula, M., and Roy, K., 2023. Is iterative feature selection technique efficient enough? A comparative performance analysis of RFECV feature selection technique in ransomware classification using SHAP. Discover Internet of Things, 3, p.21.

Naji, M.A., Filali, S.E., Aarika, K., Benlahmar, E.H., Abdelouhahid, R.A., and Debauche, O., 2021. Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Computer Science, 191, pp.487-492.

Nemade, V., and Fegade, V., 2022. Machine learning techniques for breast cancer prediction. Procedia Computer Science, 218, pp.1314-1320.

Obaid, O.I., Mohammed, M.A., Abd Ghani, M.K., Mostafa, S.A., and AlDhief, F.T., 2018. Evaluating the performance of machine learning techniques in the classification of Wisconsin Breast Cancer. International Journal of Engineering and Technology, 7, pp.160-166.

Quinlan, J.R., 1986. Induction of decision trees. Machine Learning, 1(1), pp.81-106.

Uddin, K.M.M., Biswas, N., Rikta, S.T., and Dey, S.K., 2023. Machine learningbased diagnosis of breast cancer utilizing feature optimization technique. Computer Methods and Programs in Biomedicine Updates, 3, p.100098.

Wang, W.C., Tian, W.C., Xu, D.M., and Zang, H.F., 2024. Arctic puffin optimization: A bio-inspired metaheuristic algorithm for solving engineering design optimization. Advances in Engineering Software, 195, p.103694.

Wolberg, W., Mangasarian, O., Street, N., and Street, W., 1993. Breast cancer Wisconsin (Diagnostic) [Dataset]. UCI Machine Learning Repository.

Yadav, R.K., Singh, P., and Kashtriya, P., 2022. Diagnosis of breast cancer using machine learning techniques - a survey. Procedia Computer Science, 218, pp.1434-1443.

Published

2026-03-15

How to Cite

Jasim, H. N., Jasim, W. M. and Ibrahim, M. S. (2026) “Improving Breast Cancer Classification with Adaptive Synthetic Sampling, Feature Selection, and Hyperparameter Optimization”, ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 14(1), pp. 107–116. doi: 10.14500/aro.12386.
Received 2025-06-27
Accepted 2026-02-01
Published 2026-03-15

Similar Articles

<< < 2 3 4 5 6 7 8 9 10 11 > >> 

You may also start an advanced similarity search for this article.