Authors
Birkan Eyup Yilmaz, Busra Nur Gokkurt Yilmaz, Furkan Ozbey
Published in
BMC oral health. Volume 25. Issue 1. Pages 573. Apr 15, 2025. Epub Apr 15, 2025.
Abstract
Artificial intelligence (AI) has rapidly advanced in healthcare and dental education, significantly impacting diagnostic processes, treatment planning, and academic training. The aim of this study is to evaluate the performance differences between different large language models (LLMs) by analyzing their accuracy rates in answers to multiple choice oral pathology questions.
This study evaluates the performance of eight LLMs (Gemini 1.5, Gemini 2, ChatGPT 4o, ChatGPT 4, ChatGPT o1, Copilot, Claude 3.5, Deepseek) in answering multiple-choice oral pathology questions from the Turkish Dental Specialization Examination (DUS). A total of 100 questions from 2012 to 2021 were analyzed. Questions were classified as "case-based" or "knowledge-based". The responses were classified as "correct" or "incorrect" based on official answer keys. To prevent learning biases, no follow-up questions or feedback were provided after the LLMs' responses.
Significant performance differences were observed among the models (p < 0.001). ChatGPT o1 achieved the highest accuracy (96 correct, 4 incorrect), followed by Claude (84 correct), Gemini 2 and Deepseek (82 correct each). Copilot had the lowest performance (61 correct). Case-based questions showed notable performance variations (p = 0.034), where ChatGPT o1 and Claude excelled. For knowledge-based questions, ChatGPT o1 and Deepseek demonstrated the highest accuracy (p < 0.001). Post-hoc analysis revealed that ChatGPT o1 performed significantly better than most other models across both case-based and knowledge-based questions (p < 0.0031).
LLMs demonstrated variable proficiency in oral pathology questions, with ChatGPT o1 showing higher accuracy. LLMs shows promise as a supplementary educational tool, though further validation is required.
PMID:
40234873
Bibliographic data and abstract were imported from PubMed on 16 Apr 2025.
Read full publication at:
Please sign in
to see all details.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 19
- Comments 0