Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis.

Created on 16 Apr 2025

Authors

Birkan Eyup Yilmaz, Busra Nur Gokkurt Yilmaz, Furkan Ozbey

Published in

BMC oral health. Volume 25. Issue 1. Pages 573. Apr 15, 2025. Epub Apr 15, 2025.

Abstract

Artificial intelligence (AI) has rapidly advanced in healthcare and dental education, significantly impacting diagnostic processes, treatment planning, and academic training. The aim of this study is to evaluate the performance differences between different large language models (LLMs) by analyzing their accuracy rates in answers to multiple choice oral pathology questions.
This study evaluates the performance of eight LLMs (Gemini 1.5, Gemini 2, ChatGPT 4o, ChatGPT 4, ChatGPT o1, Copilot, Claude 3.5, Deepseek) in answering multiple-choice oral pathology questions from the Turkish Dental Specialization Examination (DUS). A total of 100 questions from 2012 to 2021 were analyzed. Questions were classified as "case-based" or "knowledge-based". The responses were classified as "correct" or "incorrect" based on official answer keys. To prevent learning biases, no follow-up questions or feedback were provided after the LLMs' responses.
Significant performance differences were observed among the models (p < 0.001). ChatGPT o1 achieved the highest accuracy (96 correct, 4 incorrect), followed by Claude (84 correct), Gemini 2 and Deepseek (82 correct each). Copilot had the lowest performance (61 correct). Case-based questions showed notable performance variations (p = 0.034), where ChatGPT o1 and Claude excelled. For knowledge-based questions, ChatGPT o1 and Deepseek demonstrated the highest accuracy (p < 0.001). Post-hoc analysis revealed that ChatGPT o1 performed significantly better than most other models across both case-based and knowledge-based questions (p < 0.0031).
LLMs demonstrated variable proficiency in oral pathology questions, with ChatGPT o1 showing higher accuracy. LLMs shows promise as a supplementary educational tool, though further validation is required.

PMID:
40234873
Bibliographic data and abstract were imported from PubMed on 16 Apr 2025.

Read full publication at:
Please sign in to see all details.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Reviewers' rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 19
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement