ВЛИЯНИЕ ТЕХНИКИ PROMPT ENGINEERING НА ТОЧНОСТЬ ОТВЕТОВ БОЛЬШИХ ЯЗЫКОВЫХ МОДЕЛЕЙ

Авторы

DOI:

https://doi.org/10.5281/zenodo.19201896

Ключевые слова:

Prompt Engineering, большие языковые модели, точность ответов

Лицензия

Метаданные этой статьи распространяются под лицензией CC BY 4.0

Аннотация

В статье рассматривается актуальная задача повышения точности ответов больших языковых моделей (LLM) в чат-ботах. Проблема определяется чувствительностью LLM к структуре промптов и вероятностью появления некорректных ответов. Для оптимизации работы без дообучения применяются техники Prompt Engineering: zero-shot, one-shot, few-shot, role prompting, zero-CoT, Re2 и RaR. Эксперименты на датасетах GSM8K, MMLU и BigBench Hard показали доменно-зависимый прирост точности до 15% при комбинированных подходах. Сделан вывод об эффективности адаптивного промптинга для повышения качества ответов LLM.

Скачивания

Данные по скачиваниям пока не доступны.

Библиографические ссылки

1. GPT-4 Technical Report / OpenAI, J. Achiam, S. Adler [et al.] // arXiv preprint – 2024. – arXiv 2303.08774.

2. Velasquez-Henao, Ju. D. Prompt Engineering: a methodology for optimizing interactions with AI-Language Models in the field of engineering / Ju. D. Velasquez-Henao, C. Ja. Franco-Cardona, L. Cadavid-Higuita // Dyna-Colombia. – 2023. – Vol. 90, No. 230. – P. 9-17. – DOI 10.15446/dyna.v90n230.111700. – EDN ICGTLM.

3. Prompt Engineering or Fine-Tuning: An Empirical Assessment of LLMs for Code / J. Shin [et al.] // arXiv preprint – 2025. – arXiv 2310.10508.

4. Barkley, L. Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models / L. Barkley, B. van der Merwe // arXiv preprint. – 2024. – arXiv 2410.19385.

5. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications / P. Sahoo [et al.] // arXiv preprint. – 2025. – arXiv 2402.07927.

6. Kulikov, V. A Brief Overview Of Few-Shot Prompting In the Large Language Models / V. Kulikov, R. Neychev // Conference of Open Innovations Association, FRUCT. – 2023. – No. 33. – P. 364-370. – EDN YTPAJU.

7. Lee, S. Schema Retrieval for Korean Geographic Knowledge Base Question Answering Using Few-Shot Prompting / S. Lee, K. Yu // ISPRS International Journal of Geo-Information. – 2024. – Vol. 13, No. 12. – P. 453. – DOI 10.3390/ijgi13120453. – EDN PJNINJ.

8. The Prompt Report: A Systematic Survey of Prompt Engineering Techniques / S. Schulhoff [et al.] // arXiv preprint. – 2025. – arXiv 2406.06608.

9. Aher, G. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies / G. Aher [et al.] // arXiv preprint. – 2023. – arXiv 2208.10264.

10. Chain of Thought Utilization in Large Language Models and Application in Nephrology / J. Miao, Ch. Thongprayoon, S. Suppadungsuk [et al.] // Medicina (Kaunas, Lithuania). – 2024. – Vol. 60, No. 1. – P. 148. – DOI 10.3390/medicina60010148. – EDN OCJFSO.

11. Large Language Models are Zero-Shot Reasoners / T. Kojima [et al.] // arXiv preprint. – 2023. – arXiv 2205.11916.

12. Re-Reading Improves Reasoning in Large Language Models / X. Xu [et al.] // arXiv preprint. – 2024. – arXiv 2309.06275.

13. Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves / Y. Deng [et al.] // arXiv preprint. – 2024. – arXiv 2311.04205.

14. Training Verifiers to Solve Math Word Problems / K. Cobbe [et al.] // arXiv preprint. – 2021. – arXiv 2110.14168.

15. Measuring Massive Multitask Language Understanding / D. Hendrycks [et al.] // arXiv preprint. – 2021. – arXiv 2009.03300.

16. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them / M. Suzgun [et al.] // arXiv preprint. – 2022. – arXiv 2210.09261.

17. LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods / H. Li, Q. Dong, J. Chen [et al.] // arXiv preprint. – 2024. – arXiv 2412. 05579.

18. EvaluLLM: LLM assisted evaluation of generative outputs / M. Desmond, Z. Ashktorab, Q. Pan [et al.] // Companion Proceedings of the 29th International Conference on Intelligent User Interfaces. – ACM, 2024. – P. 30–32. – DOI 10.1145/3640544.3645216.

19. BLEU: a method for automatic evaluation of machine translation / K. Papineni, S. Roukos, T. Ward [et al.] // Proceedings of the 40th Annual Meeting on Association for Computational Linguistics – Association for Computational Linguistics, 2001. – P. 311. – DOI 10.3115/1073083.1073135.

20. A Survey on Evaluation Metrics for Machine Translation / S. Lee, J. Lee, H. Moon [et al.] // Mathematics. – 2023. – Т. 11, № 4. – P. 1006. – DOI 10.3390/math11041006.

21. LLM-as-a-qualitative-judge: automating error analysis in natural language generation / N. Chirkova, T. O. Ajayi, S. Aycock [et al.] // arXiv preprint. – 2025. – arXiv 2506.09147.

22. Тетеревенков, Д. Л. Экспертно-ориентированные методы оценки качества текстовой генерации больших языковых моделей / Д. Л. Тетеревенков // Мягкие измерения и вычисления. – 2025. – Т. 90, №5. – С. 30–37. – DOI 10.36871/2618-9976.2025.05.003. – EDN WRVOBU.

REFERENCES LIST

1. GPT-4 Technical Report / OpenAI, J. Achiam, S. Adler [et al.] // arXiv preprint – 2024. – arXiv 2303.08774.

2. Velasquez-Henao, Ju. D. Prompt Engineering: a methodology for optimizing interactions with AI-Language Models in the field of engineering / Ju. D. Velasquez-Henao, C. Ja. Franco-Cardona, L. Cadavid-Higuita // Dyna-Colombia. – 2023. – Vol. 90, No. 230. – P. 9-17. – DOI 10.15446/dyna.v90n230.111700. – EDN ICGTLM.

3. Prompt Engineering or Fine-Tuning: An Empirical Assessment of LLMs for Code / J. Shin [et al.] // arXiv preprint – 2025. – arXiv 2310.10508.

4. Barkley, L. Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models / L. Barkley, B. van der Merwe // arXiv preprint. – 2024. – arXiv 2410.19385.

5. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications / P. Sahoo [et al.] // arXiv preprint. – 2025. – arXiv 2402.07927.

6. Kulikov, V. A Brief Overview Of Few-Shot Prompting In the Large Language Models / V. Kulikov, R. Neychev // Conference of Open Innovations Association, FRUCT. – 2023. – No. 33. – P. 364-370. – EDN YTPAJU.

7. Lee, S. Schema Retrieval for Korean Geographic Knowledge Base Question Answering Using Few-Shot Prompting / S. Lee, K. Yu // ISPRS International Journal of Geo-Information. – 2024. – Vol. 13, No. 12. – P. 453. – DOI 10.3390/ijgi13120453. – EDN PJNINJ.

8. The Prompt Report: A Systematic Survey of Prompt Engineering Techniques / S. Schulhoff [et al.] // arXiv preprint. – 2025. – arXiv 2406.06608.

9. Aher, G. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies / G. Aher [et al.] // arXiv preprint. – 2023. – arXiv 2208.10264.

10. Chain of Thought Utilization in Large Language Models and Application in Nephrology / J. Miao, Ch. Thongprayoon, S. Suppadungsuk [et al.] // Medicina (Kaunas, Lithuania). – 2024. – Vol. 60, No. 1. – P. 148. – DOI 10.3390/medicina60010148. – EDN OCJFSO.

11. Large Language Models are Zero-Shot Reasoners / T. Kojima [et al.] // arXiv preprint. – 2023. – arXiv 2205.11916.

12. Re-Reading Improves Reasoning in Large Language Models / X. Xu [et al.] // arXiv preprint. – 2024. – arXiv 2309.06275.

13. Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves / Y. Deng [et al.] // arXiv preprint. – 2024. – arXiv 2311.04205.

14. Training Verifiers to Solve Math Word Problems / K. Cobbe [et al.] // arXiv preprint. – 2021. – arXiv 2110.14168.

15. Measuring Massive Multitask Language Understanding / D. Hendrycks [et al.] // arXiv preprint. – 2021. – arXiv 2009.03300.

16. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them / M. Suzgun [et al.] // arXiv preprint. – 2022. – arXiv 2210.09261.

17. LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods / H. Li, Q. Dong, J. Chen [et al.] // arXiv preprint. – 2024. – arXiv 2412. 05579.

18. EvaluLLM: LLM assisted evaluation of generative outputs / M. Desmond, Z. Ashktorab, Q. Pan [et al.] // Companion Proceedings of the 29th International Conference on Intelligent User Interfaces. – ACM, 2024. – P. 30–32. – DOI 10.1145/3640544.3645216.

19. BLEU: a method for automatic evaluation of machine translation / K. Papineni, S. Roukos, T. Ward [et al.] // Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. – Association for Computational Linguistics, 2001. – P. 311. – DOI 10.3115/1073083.1073135.

20. A Survey on Evaluation Metrics for Machine Translation / S. Lee, J. Lee, H. Moon [et al.] // Mathematics. – 2023. – Т. 11, № 4. – P. 1006. – DOI 10.3390/math11041006.

21. LLM-as-a-qualitative-judge: automating error analysis in natural language generation / N. Chirkova, T. O. Ajayi, S. Aycock [et al.] // arXiv preprint. – 2025. – arXiv 2506.09147.

22. Teterevenkov, D. L. Ekspertno-orientirovannye metody otsenki kachestva tekstovoi generatsii bolshikh iazykovykh modelei / D. L. Teterevenkov // Miagkie izmereniia i vychisleniia. – 2025. – T. 90, №5. – S. 30–37. – DOI 10.36871/2618-9976.2025.05.003. – EDN WRVOBU.

Загрузки

Опубликован

27.02.2026

Выпуск

Раздел

Информационные технологии и телекоммуникации

Как цитировать

[1]
2026. ВЛИЯНИЕ ТЕХНИКИ PROMPT ENGINEERING НА ТОЧНОСТЬ ОТВЕТОВ БОЛЬШИХ ЯЗЫКОВЫХ МОДЕЛЕЙ. Вестник Донецкого университета. Серия 04. Технические науки. 1 (Feb. 2026), 141–150. DOI:https://doi.org/10.5281/zenodo.19201896.