ВЛИЯНИЕ ТЕХНИКИ PROMPT ENGINEERING НА ТОЧНОСТЬ ОТВЕТОВ БОЛЬШИХ ЯЗЫКОВЫХ МОДЕЛЕЙ
DOI:
https://doi.org/10.5281/zenodo.19201896Ключевые слова:
Prompt Engineering, большие языковые модели, точность ответовЛицензия
Аннотация
В статье рассматривается актуальная задача повышения точности ответов больших языковых моделей (LLM) в чат-ботах. Проблема определяется чувствительностью LLM к структуре промптов и вероятностью появления некорректных ответов. Для оптимизации работы без дообучения применяются техники Prompt Engineering: zero-shot, one-shot, few-shot, role prompting, zero-CoT, Re2 и RaR. Эксперименты на датасетах GSM8K, MMLU и BigBench Hard показали доменно-зависимый прирост точности до 15% при комбинированных подходах. Сделан вывод об эффективности адаптивного промптинга для повышения качества ответов LLM.
Скачивания
Библиографические ссылки
1. GPT-4 Technical Report / OpenAI, J. Achiam, S. Adler [et al.] // arXiv preprint – 2024. – arXiv 2303.08774.
2. Velasquez-Henao, Ju. D. Prompt Engineering: a methodology for optimizing interactions with AI-Language Models in the field of engineering / Ju. D. Velasquez-Henao, C. Ja. Franco-Cardona, L. Cadavid-Higuita // Dyna-Colombia. – 2023. – Vol. 90, No. 230. – P. 9-17. – DOI 10.15446/dyna.v90n230.111700. – EDN ICGTLM.
3. Prompt Engineering or Fine-Tuning: An Empirical Assessment of LLMs for Code / J. Shin [et al.] // arXiv preprint – 2025. – arXiv 2310.10508.
4. Barkley, L. Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models / L. Barkley, B. van der Merwe // arXiv preprint. – 2024. – arXiv 2410.19385.
5. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications / P. Sahoo [et al.] // arXiv preprint. – 2025. – arXiv 2402.07927.
6. Kulikov, V. A Brief Overview Of Few-Shot Prompting In the Large Language Models / V. Kulikov, R. Neychev // Conference of Open Innovations Association, FRUCT. – 2023. – No. 33. – P. 364-370. – EDN YTPAJU.
7. Lee, S. Schema Retrieval for Korean Geographic Knowledge Base Question Answering Using Few-Shot Prompting / S. Lee, K. Yu // ISPRS International Journal of Geo-Information. – 2024. – Vol. 13, No. 12. – P. 453. – DOI 10.3390/ijgi13120453. – EDN PJNINJ.
8. The Prompt Report: A Systematic Survey of Prompt Engineering Techniques / S. Schulhoff [et al.] // arXiv preprint. – 2025. – arXiv 2406.06608.
9. Aher, G. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies / G. Aher [et al.] // arXiv preprint. – 2023. – arXiv 2208.10264.
10. Chain of Thought Utilization in Large Language Models and Application in Nephrology / J. Miao, Ch. Thongprayoon, S. Suppadungsuk [et al.] // Medicina (Kaunas, Lithuania). – 2024. – Vol. 60, No. 1. – P. 148. – DOI 10.3390/medicina60010148. – EDN OCJFSO.
11. Large Language Models are Zero-Shot Reasoners / T. Kojima [et al.] // arXiv preprint. – 2023. – arXiv 2205.11916.
12. Re-Reading Improves Reasoning in Large Language Models / X. Xu [et al.] // arXiv preprint. – 2024. – arXiv 2309.06275.
13. Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves / Y. Deng [et al.] // arXiv preprint. – 2024. – arXiv 2311.04205.
14. Training Verifiers to Solve Math Word Problems / K. Cobbe [et al.] // arXiv preprint. – 2021. – arXiv 2110.14168.
15. Measuring Massive Multitask Language Understanding / D. Hendrycks [et al.] // arXiv preprint. – 2021. – arXiv 2009.03300.
16. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them / M. Suzgun [et al.] // arXiv preprint. – 2022. – arXiv 2210.09261.
17. LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods / H. Li, Q. Dong, J. Chen [et al.] // arXiv preprint. – 2024. – arXiv 2412. 05579.
18. EvaluLLM: LLM assisted evaluation of generative outputs / M. Desmond, Z. Ashktorab, Q. Pan [et al.] // Companion Proceedings of the 29th International Conference on Intelligent User Interfaces. – ACM, 2024. – P. 30–32. – DOI 10.1145/3640544.3645216.
19. BLEU: a method for automatic evaluation of machine translation / K. Papineni, S. Roukos, T. Ward [et al.] // Proceedings of the 40th Annual Meeting on Association for Computational Linguistics – Association for Computational Linguistics, 2001. – P. 311. – DOI 10.3115/1073083.1073135.
20. A Survey on Evaluation Metrics for Machine Translation / S. Lee, J. Lee, H. Moon [et al.] // Mathematics. – 2023. – Т. 11, № 4. – P. 1006. – DOI 10.3390/math11041006.
21. LLM-as-a-qualitative-judge: automating error analysis in natural language generation / N. Chirkova, T. O. Ajayi, S. Aycock [et al.] // arXiv preprint. – 2025. – arXiv 2506.09147.
22. Тетеревенков, Д. Л. Экспертно-ориентированные методы оценки качества текстовой генерации больших языковых моделей / Д. Л. Тетеревенков // Мягкие измерения и вычисления. – 2025. – Т. 90, №5. – С. 30–37. – DOI 10.36871/2618-9976.2025.05.003. – EDN WRVOBU.
REFERENCES LIST
1. GPT-4 Technical Report / OpenAI, J. Achiam, S. Adler [et al.] // arXiv preprint – 2024. – arXiv 2303.08774.
2. Velasquez-Henao, Ju. D. Prompt Engineering: a methodology for optimizing interactions with AI-Language Models in the field of engineering / Ju. D. Velasquez-Henao, C. Ja. Franco-Cardona, L. Cadavid-Higuita // Dyna-Colombia. – 2023. – Vol. 90, No. 230. – P. 9-17. – DOI 10.15446/dyna.v90n230.111700. – EDN ICGTLM.
3. Prompt Engineering or Fine-Tuning: An Empirical Assessment of LLMs for Code / J. Shin [et al.] // arXiv preprint – 2025. – arXiv 2310.10508.
4. Barkley, L. Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models / L. Barkley, B. van der Merwe // arXiv preprint. – 2024. – arXiv 2410.19385.
5. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications / P. Sahoo [et al.] // arXiv preprint. – 2025. – arXiv 2402.07927.
6. Kulikov, V. A Brief Overview Of Few-Shot Prompting In the Large Language Models / V. Kulikov, R. Neychev // Conference of Open Innovations Association, FRUCT. – 2023. – No. 33. – P. 364-370. – EDN YTPAJU.
7. Lee, S. Schema Retrieval for Korean Geographic Knowledge Base Question Answering Using Few-Shot Prompting / S. Lee, K. Yu // ISPRS International Journal of Geo-Information. – 2024. – Vol. 13, No. 12. – P. 453. – DOI 10.3390/ijgi13120453. – EDN PJNINJ.
8. The Prompt Report: A Systematic Survey of Prompt Engineering Techniques / S. Schulhoff [et al.] // arXiv preprint. – 2025. – arXiv 2406.06608.
9. Aher, G. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies / G. Aher [et al.] // arXiv preprint. – 2023. – arXiv 2208.10264.
10. Chain of Thought Utilization in Large Language Models and Application in Nephrology / J. Miao, Ch. Thongprayoon, S. Suppadungsuk [et al.] // Medicina (Kaunas, Lithuania). – 2024. – Vol. 60, No. 1. – P. 148. – DOI 10.3390/medicina60010148. – EDN OCJFSO.
11. Large Language Models are Zero-Shot Reasoners / T. Kojima [et al.] // arXiv preprint. – 2023. – arXiv 2205.11916.
12. Re-Reading Improves Reasoning in Large Language Models / X. Xu [et al.] // arXiv preprint. – 2024. – arXiv 2309.06275.
13. Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves / Y. Deng [et al.] // arXiv preprint. – 2024. – arXiv 2311.04205.
14. Training Verifiers to Solve Math Word Problems / K. Cobbe [et al.] // arXiv preprint. – 2021. – arXiv 2110.14168.
15. Measuring Massive Multitask Language Understanding / D. Hendrycks [et al.] // arXiv preprint. – 2021. – arXiv 2009.03300.
16. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them / M. Suzgun [et al.] // arXiv preprint. – 2022. – arXiv 2210.09261.
17. LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods / H. Li, Q. Dong, J. Chen [et al.] // arXiv preprint. – 2024. – arXiv 2412. 05579.
18. EvaluLLM: LLM assisted evaluation of generative outputs / M. Desmond, Z. Ashktorab, Q. Pan [et al.] // Companion Proceedings of the 29th International Conference on Intelligent User Interfaces. – ACM, 2024. – P. 30–32. – DOI 10.1145/3640544.3645216.
19. BLEU: a method for automatic evaluation of machine translation / K. Papineni, S. Roukos, T. Ward [et al.] // Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. – Association for Computational Linguistics, 2001. – P. 311. – DOI 10.3115/1073083.1073135.
20. A Survey on Evaluation Metrics for Machine Translation / S. Lee, J. Lee, H. Moon [et al.] // Mathematics. – 2023. – Т. 11, № 4. – P. 1006. – DOI 10.3390/math11041006.
21. LLM-as-a-qualitative-judge: automating error analysis in natural language generation / N. Chirkova, T. O. Ajayi, S. Aycock [et al.] // arXiv preprint. – 2025. – arXiv 2506.09147.
22. Teterevenkov, D. L. Ekspertno-orientirovannye metody otsenki kachestva tekstovoi generatsii bolshikh iazykovykh modelei / D. L. Teterevenkov // Miagkie izmereniia i vychisleniia. – 2025. – T. 90, №5. – S. 30–37. – DOI 10.36871/2618-9976.2025.05.003. – EDN WRVOBU.
Загрузки
Опубликован
Выпуск
Раздел
Лицензия

Это произведение доступно по лицензии Creative Commons «Attribution-NonCommercial» («Атрибуция — Некоммерческое использование») 4.0 Всемирная.
Статьи журнала «Вестник Донецкого университета. Серия 04. Технические науки» находятся в открытом доступе и распространяются в соответствии с условиями Лицензионного Договора с Донецким Государственным университетом, который бесплатно предоставляет авторам неограниченное распространение и самостоятельное архивирование.





