Aim: Chat Generative Pre-trained Transformer (ChatGPT) is a large multimodal language model created by OpenAI. Although Chat Generative Pre-trained Transformer (ChatGPT) is successful in the medical field, especially in answering medical questions in English, the number of studies evaluating the performance of ChatGPT in Turkish is limited. The aim of this study was to investigate the success of ChatGPT in a theoretical exam measuring professional knowledge in medical education and to explore the existing research evidence on the performance of ChatGPT in exams.
Materials and Methods: The free version 1.2023.256 of ChatGPT was used in this descriptive study. 150 questions from the 2022 2nd semester final exam were used. The 6 questions that the AI could not fully recognise and were abbreviations were removed. Each question was added to ChatGPT and the answer was recorded by comparing it with the correct answer. Enterprise Learning Management Planning System (KEYPS) and Excel were used to calculate accuracy rates for each question type. The answer was received in writing.
Results: ChatGPT answered 104 questions correctly and 46 questions incorrectly out of 150 questions. The item difficulty index of the incorrect questions was 0.68. Normally, this index should be around 0.50. Unanswered questions were categorised as difficult questions.
Conclusion: As a result, ChatGPT's ability to know the questions was not good enough. ChatGPT performed well on negatively worded questions and case questions. ChatGPT can be a useful tool for learning
Key words: Artificial intelligence, educational measurement, medicine, exam
|