Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

Assessing ChatGPT-4's performance on the US prosthodontic exam: impact of fine-tuning and contextual prompting vs. base knowledge, a cross-sectional study.

Created on 24 May 2025

Authors

Mahmood Dashti, Farshad Khosraviani, Tara Azimi, Delband Hefzi, Shohreh Ghasemi, Amir Fahimipour, Niusha Zare, Zohaib Khurshid, Syed Rashid Habib

Published in

BMC medical education. Volume 25. Issue 1. Pages 761. May 23, 2025. Epub May 23, 2025.

Abstract

Artificial intelligence (AI), such as ChatGPT-4 from OpenAI, has the potential to transform medical education and assessment. However, its effectiveness in specialized fields like prosthodontics, especially when comparing base to fine-tuned models, remains underexplored. This study evaluates the performance of ChatGPT-4 on the US National Prosthodontic Resident Mock Exam in its base form and after fine-tuning. The aim is to determine whether fine-tuning improves the AI's accuracy in answering specialized questions.
An official sample questions from the 2021 US National Prosthodontic Resident Mock Exam was used, obtained from the American College of Prosthodontists. A total of 150 questions were initially considered, and resources were available for 106 questions. Both the base and fine-tuned models of ChatGPT-4 were tested under simulated exam conditions. Performance was assessed by comparing correct and incorrect responses. The Chi-square test was used to analyze accuracy, with significance set at p < 0.05. The Kappa coefficient was calculated to measure agreement between the models' responses.
The base model of ChatGPT-4 correctly answered 62.7% of the 150 questions. For the 106 questions with resources, the fine-tuned model answered 73.6% correctly. The Chi-square test showed a significant improvement in performance after fine-tuning (p < 0.001). The Kappa coefficient was 0.39, indicating moderate agreement between the models (p < 0.001). Performance varied by topic, with lower accuracy in areas such as Implant Prosthodontics, Removable Prosthodontics, and Occlusion, though the fine-tuned model consistently outperformed the base model.
Fine-tuning ChatGPT-4 with specific resources significantly enhances its accuracy in answering specialized prosthodontic exam questions. While the base model provides a solid baseline, fine-tuning is essential for improving AI performance in specialized fields. However, certain topics may require more targeted training to achieve higher accuracy.

PMID:
40410713
Bibliographic data and abstract were imported from PubMed on 24 May 2025.

Read full publication at:
Please sign in to see all details.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Reviewers' rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 44
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement