Improving Breast Cancer Classification with Adaptive Synthetic Sampling, Feature Selection, and Hyperparameter Optimization
DOI:
https://doi.org/10.14500/aro.12386Keywords:
Adaptive Synthetic Sampling, Arctic Puffin Optimisation, Breast Cancer Detection, Feature Selection, Hyperparameter Optimization, Machine LearningAbstract
Breast cancer is a major global health concern, highlighting the need for accurate and efficient diagnostic solutions rather than persistent issues with detection accuracy. This study presents an enhanced machine learning framework to improve breast cancer classification by addressing key limitations: Class imbalance, irrelevant features, and suboptimal hyperparameters. Adaptive synthetic sampling (ADASYN) was used to balance class distribution and various feature selection techniques. Univariate Selection and recursive feature elimination improved feature relevance, and arctic puffin optimization (APO) was applied for hyperparameter tuning. Multiple classifiers were evaluated using the Wisconsin Diagnostic Breast Cancer dataset. The random forest (RF) with ADASYN approach, optimized using APO, achieved outstanding results – 99.53% accuracy, 100% precision, 99.07% recall, and 99.53% F1-score – with only one misclassification out of 569 samples. This framework, while not modifying ADASYN or RF algorithms themselves, significantly enhances diagnostic performance and serves as a robust foundation for clinical decision support systems.
Downloads
References
Ahamed, M.R.H., 2024. Early detection: Machine learning for breast cancer prediction. In: Conference: Early Detection: Machine Learning for Breast Cancer PredictionAt: Galle, Sri Lanka.
Aiyeniko, O., 2023. Performance evaluation of metaheuristic algorithms for feature selection in breast cancer predictive model. American Journal of Computer Sciences and Applications, 11(2), pp.1-10.
Allam, M., and Nandhini, M., 2022. Optimal feature selection using binary teaching learning based optimization algorithm. Journal of King Saud University - Computer and Information Sciences, 34, pp.329-341.
Assegie, T.A., Salau, A.O., Sampath, K., Govindarajan, R., Murugan, S., and Lakshmi, B., 2024. Evaluation of adaptive synthetic resampling technique for imbalanced breast cancer identification. Procedia Computer Science, 235, pp.1000-1007.
Breiman, L., 2001. Random forests. Machine Learning, 45, pp.5-32. Cortes, C., Vapnik, V., 1995. Support-vector networks. Machine Learning, 20, pp.273-297.
Cover, T.M., and Hart, P.E., 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), pp.21-27.
Cox, D.R., 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20, pp.215-232.
Gárate-Escamila, A.K., Hajjam El Hassani, A., and Andrès, E., 2020. Classification models for heart disease prediction using feature selection and PCA. Informatics in Medicine Unlocked, 19, 100330.
Gopal, V.N., Al-Turjman, F., Kumar, R., Anand, L., and Rajesh, M., 2021. Feature selection and classification in breast cancer prediction using IoT and machine learning. Measurement, 178, p.109442.
Gupta, P., and Garg, S., 2020. Breast cancer prediction using varying parameters of machine learning models. Procedia Computer Science, 171, pp.593-601.
He, H., Bai, Y., Garcia, E.A., and Li, S., 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, IEEE. pp.1322-1328.
Krawczyk, B., 2016. Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence, 5, pp.221-232.
Kulkarni, A., Chong, D., and Batarseh, F.A., 2020. Foundations of data imbalance and solutions for a data democracy. In: Data Democracy: At the Nexus of Artificial Intelligence, Software Development, and Knowledge Engineering, Academic Press, United States, pp.83-106.
Lukong, K.E., 2017. Understanding breast cancer – The long and winding road. BBA Clinical, 7, pp.64-77.
Magboo, V.P.C., and Magboo, M.S., 2021. Machine learning classifiers on breast cancer recurrences. Procedia Computer Science, 192, pp.2742-2752.
Mangasarian, O.L., Street, W.N., and Wolberg, W.H., 1994. Breast cancer diagnosis and prognosis via linear programming. AAAI Spring Symposium – Technical Report, SS-94-01, pp.83-86.
Minnoor, M., and Baths, V., 2022. Diagnosis of breast cancer using random forests. Procedia Computer Science, 218, pp.429-437.
Mowri, R.A., Siddula, M., and Roy, K., 2023. Is iterative feature selection technique efficient enough? A comparative performance analysis of RFECV feature selection technique in ransomware classification using SHAP. Discover Internet of Things, 3, p.21.
Naji, M.A., Filali, S.E., Aarika, K., Benlahmar, E.H., Abdelouhahid, R.A., and Debauche, O., 2021. Machine learning algorithms for breast cancer prediction and diagnosis. Procedia Computer Science, 191, pp.487-492.
Nemade, V., and Fegade, V., 2022. Machine learning techniques for breast cancer prediction. Procedia Computer Science, 218, pp.1314-1320.
Obaid, O.I., Mohammed, M.A., Abd Ghani, M.K., Mostafa, S.A., and AlDhief, F.T., 2018. Evaluating the performance of machine learning techniques in the classification of Wisconsin Breast Cancer. International Journal of Engineering and Technology, 7, pp.160-166.
Quinlan, J.R., 1986. Induction of decision trees. Machine Learning, 1(1), pp.81-106.
Uddin, K.M.M., Biswas, N., Rikta, S.T., and Dey, S.K., 2023. Machine learningbased diagnosis of breast cancer utilizing feature optimization technique. Computer Methods and Programs in Biomedicine Updates, 3, p.100098.
Wang, W.C., Tian, W.C., Xu, D.M., and Zang, H.F., 2024. Arctic puffin optimization: A bio-inspired metaheuristic algorithm for solving engineering design optimization. Advances in Engineering Software, 195, p.103694.
Wolberg, W., Mangasarian, O., Street, N., and Street, W., 1993. Breast cancer Wisconsin (Diagnostic) [Dataset]. UCI Machine Learning Repository.
Yadav, R.K., Singh, P., and Kashtriya, P., 2022. Diagnosis of breast cancer using machine learning techniques - a survey. Procedia Computer Science, 218, pp.1434-1443.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Hayder N. Jasim, Wesam M. Jasim, Mohammed S. Ibrahim

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who choose to publish their work with Aro agree to the following terms:
-
Authors retain the copyright to their work and grant the journal the right of first publication. The work is simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0]. This license allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors have the freedom to enter into separate agreements for the non-exclusive distribution of the journal's published version of the work. This includes options such as posting it to an institutional repository or publishing it in a book, as long as proper acknowledgement is given to its initial publication in this journal.
-
Authors are encouraged to share and post their work online, including in institutional repositories or on their personal websites, both prior to and during the submission process. This practice can lead to productive exchanges and increase the visibility and citation of the published work.
By agreeing to these terms, authors acknowledge the importance of open access and the benefits it brings to the scholarly community.
Accepted 2026-02-01
Published 2026-03-15







ARO Journal is a scientific, peer-reviewed, periodical, and diamond OAJ that has no APC or ASC.