A Study of Large Language Models in Detecting Python Code Violations
DOI:
https://doi.org/10.14500/aro.12395Keywords:
Large language models, Software metrics, Software quality, Static code analysisAbstract
Adhering to good coding practices is critical for enhancing a software’s readability, maintainability, and reliability. Common static code analysis tools for Python, such as Pylint and Flake8, are widely used to enforce code quality by detecting coding violations without executing the code. Yet, they often fail to handle deeper semantic understanding and contextual reasoning. This study investigates the effectiveness of large language models (LLMs) compared to traditional static code analysis tools in detecting Python coding violations. Six state-of-the-art LLMs: ChatGPT, Gemini, Claude Sonnet, DeepSeek, Kimi, and Qwen are evaluated against Pylint and Flake8 tools. To do so, a curated dataset of 75 Python code snippets, annotated with 27 common code violations, is used. In addition, three common prompting strategies: Structural, chain-of-thought, and role-based, are used to instruct the selected LLMs. The experimental results reveal that Claude Sonnet achieved the highest F1-Score (0.81), outperforming Flake8 (0.79) and demonstrating strong precision (0.99) and recall (0.69). However, LLMs show differences in performance, with Qwen and DeepSeek underperforming relative to others. Moreover, LLMs that identified documentation and design violations (such as type hints and nested method structures) perform better than stylistic consistency and complex semantic reasoning. The results are heavily influenced by the prompting approach, with structural prompts yielding the most balanced performance in the majority of cases. This research contributes to the empirical work on employing LLMs for code quality assurance while also demonstrating their potential role as complementary static code analysis tools for Python, with methodologies that may extend to other languages.
Downloads
References
AlOmar, E.A., and Mkaouer, M.W., 2024. Cultivating software quality improvement in the classroom: An experience with chatGPT. In: 2024 36th International Conference on Software Engineering Education and Training (CSEE&T). IEEE, United States, pp.1-10. DOI: https://doi.org/10.1109/CSEET62301.2024.10663028
Carandang, K.A.M., Arana, J.M., Casin, E.R., Monterola, C., Tan, D.S., Valenzuela, J.F.B., and Alis, C., 2025. Are LLMs reliable? An exploration of the reliability of large language models in clinical note generation. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track). Association for Computational Linguistics, Stroudsburg, PA, USA, pp.1413-1422. DOI: https://doi.org/10.18653/v1/2025.acl-industry.99
Flake8-Dev., 2025. Flake8 : Is a Python Tool That Check the Style and Quality of Some Python Code. Available from: https://github.com/pycqa/flake8 [Last accessed on 2025 Feb 17].
Guo, Q., Cao, J., Xie, X., Liu, S., Li, X., Chen, B., and Peng, X., 2023a. Exploring the Potential of ChatGPT in Automated Code Refinement: An Empirical Study. Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pp.1-13. DOI: https://doi.org/10.1145/3597503.3623306
Guo, Z., Tan, T., Liu, S., Liu, X., Lai, W., Yang, Y., Li, Y., Chen, L., Dong, W., and Zhou, Y., 2023b. Mitigating false positive static analysis warnings: Progress, challenges, and opportunities. IEEE Transactions on Software Engineering, 49(12), pp.5154-5188. DOI: https://doi.org/10.1109/TSE.2023.3329667
Haindl, P., and Weinberger, G., 2024. Does chatGPT Help novice programmers write better code? results from static code analysis. IEEE Access, 12, pp.114146114156. DOI: https://doi.org/10.20944/preprints202406.1151.v1
Hajipour, H., Hassler, K., Holz, T., Schönherr, L., and Fritz, M., 2024. CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models. In: 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). IEEE, United States, pp.684-709. DOI: https://doi.org/10.1109/SaTML59370.2024.00040
Ignatyev, V.N., Shimchik, N.V., Panov, D.D., and Mitrofanov, A.A., 2024. Large language models in source code static analysis. In: 2024 Ivannikov Memorial Workshop (IVMEM). IEEE, United States, pp.28-35. DOI: https://doi.org/10.1109/IVMEM63006.2024.10659715
Jesse, K., Ahmed, T., Devanbu, P.T., and Morgan, E., 2023. Large language models and simple, stupid bugs. In: 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). IEEE, United States, pp.563-575. DOI: https://doi.org/10.1109/MSR59073.2023.00082
Kedia, N.K., Kumari, H., and Mundra, S., 2023. A review paper on python for data science and machine learning. Journal of Analysis and Computations, 17(2), pp.97-103. DOI: https://doi.org/10.30696/JAC.XVII.2.2023.97-103
Li, H., Hao, Y., Zhai, Y., and Qian, Z., 2023a. Assisting static analysis with large language models: A ChatGPT experiment. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, New York, NY, USA: pp.2107-2111. DOI: https://doi.org/10.1145/3611643.3613078
Li, H., Hao, Y., Zhai, Y., and Qian, Z., 2023b. The Hitchhiker’s Guide to Program Analysis: A Journey with Large Language Models. Cornell University, United States.
Li, Z., Dutta, S., and Naik, M., 2025. IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. Cornell University, United States, pp.1-24.
Liu, Z., Yang, Z., and Liao, Q., 2024. Exploration On Prompting LLM With Code-Specific Information For Vulnerability Detection. In: Proceedings - 2024 IEEE International Conference on Software Services Engineering, SSE 2024, pp.273-281. DOI: https://doi.org/10.1109/SSE62657.2024.00049
Ma, W., Liu, S., Lin, Z., Wang, W., Hu, Q., Liu, Y., Zhang, C., Nie, L., Li, L., and Liu, Y., 2024. LLMs: Understanding Code Syntax and Semantics for Code Analysis. Cornell University, United States.
Mohajer, M.M., Aleithan, R., Harzevili, N.S., Wei, M., Belle, A.B., Pham, H.V., and Wang, S., 2023. SkipAnalyzer: A Tool for Static Code Analysis with Large Language Models. Cornell University, United States.
Moratis, K., Diamantopoulos, T., Nastos, D.N., and Symeonidis, A., 2024. Write me This Code: An Analysis of ChatGPT Quality for Producing Source Code. Proceedings - 2024 IEEE/ACM 21st International Conference on Mining Software Repositories, MSR 2024, pp.147-151. DOI: https://doi.org/10.1145/3643991.3645070
Mousavi, S.M., Alghisi, S., and Riccardi, G., 2025. LLMs as Repositories of Factual Knowledge: Limitations and Solutions. Cornell University, United States, pp.1-13.
Noever, D., 2023. Can Large Language Models Find and Fix Vulnerable Software? Cornell University, United States.
Omar, M., and Shiaeles, S., 2023. VulDetect: A novel technique for detecting software vulnerabilities using language models. In: 2023 IEEE International Conference on Cyber Security and Resilience (CSR). IEEE, United States, pp.105-110. DOI: https://doi.org/10.1109/CSR57506.2023.10224924
Pearce, H., Tan, B., Ahmad, B., Karri, R., and Dolan-Gavitt, B., 2023. Examining zero-shot vulnerability repair with large language models. In: 2023 IEEE Symposium on Security and Privacy (SP). IEEE, United States, pp.2339-2356. DOI: https://doi.org/10.1109/SP46215.2023.10179324
Purba, M.D., Ghosh, A., Radford, B.J., and Chu, B., 2023. Software Vulnerability Detection using Large Language Models. In: 2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, United States, pp.112-119. DOI: https://doi.org/10.1109/ISSREW60843.2023.00058
Pylint-Dev., 2024. Pylint: It’s Not Just a Linter That Annoys You. Available from: https://github.com/pylint-dev/pylint [Last accessed on 2025 Feb 17].
Raschka, S., Patterson, J., and Nolet, C., 2020. Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information (Switzerland), 11(4), p.193. DOI: https://doi.org/10.3390/info11040193
Ságodi, Z., Siket, I., and Ferenc, R., 2024. Methodology for code synthesis evaluation of LLMs presented by a case study of ChatGPT and copilot. IEEE Access, 12, pp.72303-72316. DOI: https://doi.org/10.1109/ACCESS.2024.3403858
Souma, N., Ito, W., Obara, M., Kawaguchi, T., Akinobu, Y., Kurabayashi, T., Tanno, H., and Kuramitsu, K., 2023. Can chatGPT correct code based on logical steps. Proceedings Asia-Pacific Software Engineering Conference, APSEC, 2, pp.653-654. DOI: https://doi.org/10.1109/APSEC60848.2023.00094
Venkatesh, A.P.S., Sabu, S., Mir, A.M., Reis, S., and Bodden, E., 2024. The emergence of large language models in static analysis: A first look through micro-benchmarks. In: Proceedings of the 2024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering. ACM, New York, NY, USA, pp.35-39. DOI: https://doi.org/10.1145/3650105.3652288
Wadhwa, N., Pradhan, J., Sonwane, A., Sahu, S.P., Natarajan, N., Kanade, A., Parthasarathy, S., and Rajamani, S., 2024. CORE: Resolving code quality issues using LLMs. Proceedings of the ACM on Software Engineering, 1(FSE), pp.789-811. DOI: https://doi.org/10.1145/3643762
Yin, X., Ni, C., and Wang, S., 2024. Multitask-based evaluation of open-source LLM on software vulnerability. IEEE Transactions on Software Engineering, 50(11), pp.3071-3087. DOI: https://doi.org/10.1109/TSE.2024.3470333
Zhang, Q., Fang, C., Xie, Y., Zhang, Y., Yang, Y., Sun, W., Yu, S., and Chen, Z., 2024. A Survey on Large Language Models for Software Engineering. Cornell University, United States.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Hekar A. Mohammed Salih, Qusay I. Sarhan

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who choose to publish their work with Aro agree to the following terms:
- 
Authors retain the copyright to their work and grant the journal the right of first publication. The work is simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0]. This license allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
 - 
Authors have the freedom to enter into separate agreements for the non-exclusive distribution of the journal's published version of the work. This includes options such as posting it to an institutional repository or publishing it in a book, as long as proper acknowledgement is given to its initial publication in this journal.
 - 
Authors are encouraged to share and post their work online, including in institutional repositories or on their personal websites, both prior to and during the submission process. This practice can lead to productive exchanges and increase the visibility and citation of the published work.
 
By agreeing to these terms, authors acknowledge the importance of open access and the benefits it brings to the scholarly community.
Accepted 2025-09-14
Published 2025-10-01
			
		
			
			
				






 ARO Journal is a scientific, peer-reviewed, periodical, and diamond OAJ that has no APC or ASC.