Evaluating Large Language Models for Arduino Code Generation
DOI:
https://doi.org/10.14500/aro.12344Keywords:
Large Language Models, Arduino, Code Generation, Internet of Things, Code PerformanceAbstract
Large language models (LLMs), also known as generative AI, have transformed code generation by translating natural language prompts into executable code. Yet, their capabilities in generating code for resource-constrained devices such as Arduino, which are used in the Internet of Things and embedded systems, remained underexplored. This study evaluates six state-of-the-art LLMs for generating correct, efficient, and high-quality Arduino code. The evaluation was performed across five dimensions, namely functional correctness, runtime efficiency, memory usage, code quality, similarity to human-written code, and multi-round error correction. The results reveal that ChatGPT-4o achieves the highest zero-shot functional correctness and aligns closely with human code in readability and similarity. On the other hand, Gemini 2.0 Flash generates faster-executing code but at the cost of higher code complexity and lower similarity. DeepSeek-V3 balances correctness with superior flash memory optimization, whereas Claude 3.5 Sonnet struggles with prompt adherence. Finally, multi-round error correction improves correctness across all six models. Overall, the f indings underscore that none of the evaluated LLMs consistently outperforms all evaluation criteria. Hence, model choice must align with project priorities; as shown, ChatGPT-4o excels in functional correctness, whereas Gemini 2.0 excels in execution time, and DeepSeek-V3 in memory efficiency. This study provides a systematic evaluation of code generated with LLMs for Arduino, which, to the best of our knowledge, has not been previously studied across multiple models and performance metrics, thereby establishing a foundation for future research and contributing to enhancing the trustworthiness and effectiveness of LLM-generated code.
Downloads
References
Abdullah, A.A., Mohammed, N.S., Khanzadi, M., Asaad, S.M., Abdul, Z.K., and Maghdid, H.S., 2025. In-depth analysis on machine learning approaches: Techniques, applications, and trends. The Scientific Journal of Koya University, 13(1), pp.190-202.
Beurer-Kellner, L., Vechev, M., and Fischer, M., 2023. Prompting is programming: A query language for large language models. Proceedings of the ACM on Programming Languages, 7, pp.1946-1969.
Bucaioni, A., Ekedahl, H., Helander, V., and Nguyen, P.T., 2024. Programming with ChatGPT: How far can we go? Machine Learning with Applications, 15, p.100526.
Clark, A., Igbokwe, D., Ross, S., and Zibran, M.F., 2024. A Quantitative Analysis of Quality and Consistency in AI-Generated Code. In: Proceedings - 2024 7th International Conference on Software and System Engineering, ICoSSE 2024. Institute of Electrical and Electronics Engineers Inc., pp.37-41.
Coello, C.E.A., Alimam, M.N., and Kouatly, R., 2024. Effectiveness of ChatGPT in coding: A comparative analysis of popular large language models. Digital, 4(1), pp.114-125.
DeLorenzo, M., Gohil, V., and Rajendran, J., 2024. CreativEval: Evaluating Creativity of LLM-Based Hardware Code Generation. In: Conference: 2024 IEEE LLM Aided Design Workshop (LAD). pp.1-5.
Ebert, C., Cain, J., Antoniol, G., Counsell, S., and Laplante, P., 2016. Cyclomatic complexity. IEEE Software, 33(6), pp.27-29.
Evtikhiev, M., Bogomolov, E., Sokolov, Y., and Bryksin, T., 2023. Out of the BLEU: How should we assess quality of the Code Generation models? Journal of Systems and Software, 203, p.111741.
Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L., Luo, X., Lo, D., Grundy, J., and Wang, H., 2024. Large language models for software engineering: A systematic literature review. ACM Transactions on Software Engineering and Methodology, 33(8), pp.1-79.
Jiang, J., Wang, F., Shen, J., Kim, S. and Kim, S., 2024. A survey on large language models for code generation. arXiv, arXiv:2406.00515. [Last accessed on 2025 Apr 25].
Kim, S.M., Choi, Y., and Suh, J., 2020. Applications of the open-source hardware arduino platform in the mining industry: A review. Applied Sciences, 10, 5018.
Kok, I., Demirci, O., and Ozdemir, S., 2024. When IoT Meet LLMs: Applications and Challenges. In: 2024 IEEE International Conference on Big Data (BigData). Los Alamitos, CA, USA: IEEE Computer Society. pp.7075-7084.
Koubaa, A., Qureshi, B., Ammar, A., Khan, Z., Boulila, W., and Ghouti, L., 2023. Humans are still better than ChatGPT: Case of the IEEEXtreme competition. Heliyon, 9(11), p.e21624.
Li, J., Li, G., Li, Y., and Jin, Z., 2024. Structured Chain-of-Thought Prompting for Code Generation. ACM Transactions on Software Engineering and Methodology, 34, pp.1-23.
Liu, C., Bao, X., Zhang, H., Zhang, N., Hu, H., Zhang, X., and Yan, M., 2024. Guiding ChatGPT for Better Code Generation: An Empirical Study. In: Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024. Institute of Electrical and Electronics Engineers Inc. pp.102-113.
Lu, S., Guo, D., Ren, S., Huang, J., Svyatkovskiy, A., Blanco, A., Clement, C., Drain, D., Jiang, D., Tang, D. and Li, G., 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv, arXiv:2102.04664. [Last accessed on 2025 Apr 25].
Miah, T., and Zhu, H., 2024. User Centric Evaluation of Code Generation Tools (Invited Paper). In: 2024 IEEE International Conference on Artificial Intelligence Testing (AITest). Los Alamitos, CA, USA: IEEE Computer Society. pp.109-119.
Mirjalili, S., Abdulla, A.A., Hassan, B.A., and Rashid, T.A., 2025. LLaMAAdapter + MRP: Integrating Meta-Reasoning Prompting with LLaMA-Adapter for Efficient Multi-Modal and Task-Adaptive Reasoning. TechRxiv, June 18.
Moradi Dakhel, A., Majdinasab, V., Nikanjam, A., Khomh, F., Desmarais, M.C., and Jiang, Z.M. (Jack), 2023. GitHub Copilot AI pair programmer: Asset or Liability? The Journal of Systems and Software, 203(C), p.111734.
Nayyar, A., and Puri, V., 2016. A review of Arduino board’s, Lilypad’s & Arduino shields. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom). pp.1485-1492.
Nazir, A., and Wang, Z., 2023. A comprehensive survey of ChatGPT: Advancements, applications, prospects, and challenges. Meta-Radiology, 1, p.100022.
Niu, C., Zhang, T., Li, C., Luo, B., and Ng, V., 2024. On Evaluating the Efficiency of Source Code Generated by LLMs. In: Proceedings - 2024 IEEE/ACM 1st International Conference on AI Foundation Models and Software Engineering, FORGE 2024. Association for Computing Machinery, Inc. pp.103-107.
Nuñez-Varela, A.S., Pérez-Gonzalez, H.G., Martínez-Perez, F.E., and Soubervielle-Montalvo, C., 2017. Source code metrics: A systematic mapping study. Journal of Systems and Software, 128, pp.164-197.
Palla, D., and Slaby, A., 2025. Evaluation of generative AI models in python code generation: A comparative study. IEEE Access, 13, pp.65334-65347.
Paul, D.G., Zhu, H., and Bayley, I., 2024. ScenEval: A Benchmark for ScenarioBased Evaluation of Code Generation. In: 2024 IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE. pp.55-63.
Petrovic, N., Konicanin, S., and Suljovic, S., 2023. ChatGPT in IoT Systems: Arduino Case Studies. In: 2023 IEEE 33rd International Conference on Microelectronics, MIEL 2023. Institute of Electrical and Electronics Engineers Inc., pp.1-4.
Rai, L., Khatiwada, S., Deng, C., and Liu, F., 2024. Cross-Language Code Development with Generative AI: A Source-to-Source Translation Perspective. In: 2024 IEEE 7th International Conference on Electronic Information and Communication Technology, ICEICT 2024. Institute of Electrical and Electronics Engineers Inc., pp.562-565.
Ren, S., Guo, D., Lu, S., Zhou, L., Liu, S., Tang, D., Sundaresan, N., Zhou, M., Blanco, A. and Ma, S., 2020. Codebleu: a method for automatic evaluation of code synthesis. arXiv, arXiv:2009.10297. [Last accessed on 2025 Apr 25].
Sharma, T., 2024. LLMs for Code: The Potential, Prospects, and Problems. In: Proceedings - IEEE 21st International Conference on Software Architecture Companion, ICSA-C 2024. Institute of Electrical and Electronics Engineers Inc. pp.373-374.
Shuvo, U.A., Dip, S.A., Vaskar, N.R., and Al Islam, A.B.M.A., 2025. Assessing ChatGPT’s Code Generation Capabilities with Short vs Long Context Programming Problems. In: Proceedings of the 2024 11th International Conference on Networking, Systems and Security, NSysS 2024. Association for Computing Machinery, Inc. pp.32-40.
Su, H., Ai, J., Yu, D., and Zhang, H., 2023. An Evaluation Method for Large Language Models’ Code Generation Capability. In: Proceedings - 2023 10th International Conference on Dependable Systems and Their Applications, DSA 2023. Institute of Electrical and Electronics Engineers Inc. pp.831-838.
Tashtoush, Y., Abu-El-Rub, N., Darwish, O., Al-Eidi, S., Darweesh, D., and Karajeh, O., 2023. A notional understanding of the relationship between code readability and software complexity. Information (Switzerland), 14(2), 81.
Yin, T., 2024. Lizard: A Simple Code Complexity Analyser without Caring about the c/c++ Header Files or Java Imports, Supports Most of the Popular Languages. pp. 21-27. Available from: https://github.com/terryyin/lizard [Last accessed on 2025 Apr 25].
Yusro, M., Guntoro, N., and Rikawarastuti, R., 2021. Utilization of microcontroller technology using Arduino board for Internet of Things (a systematic review). AIP Conference Proceedings, 2331, p.060004.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Sardar K. Jabrw, Qusay I. Sarhan

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who choose to publish their work with Aro agree to the following terms:
-
Authors retain the copyright to their work and grant the journal the right of first publication. The work is simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0]. This license allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
-
Authors have the freedom to enter into separate agreements for the non-exclusive distribution of the journal's published version of the work. This includes options such as posting it to an institutional repository or publishing it in a book, as long as proper acknowledgement is given to its initial publication in this journal.
-
Authors are encouraged to share and post their work online, including in institutional repositories or on their personal websites, both prior to and during the submission process. This practice can lead to productive exchanges and increase the visibility and citation of the published work.
By agreeing to these terms, authors acknowledge the importance of open access and the benefits it brings to the scholarly community.
Accepted 2025-11-16
Published 2025-01-05







ARO Journal is a scientific, peer-reviewed, periodical, and diamond OAJ that has no APC or ASC.