A Comparative Analysis of the Accuracy and Readability of Popular Artificial Intelligence-Chat Bots for Inguinal Hernia Management.

Authors

Thisun Udagedara, Ashley Tran, Sumaya Bokhari, Sharon Shiraga, Stuart Abel, Caitlin Houghton, Katie Galvin, Kamran Samakar, Luke R Putnam

Published in

The American surgeon. Pages 31348251353065. Jun 25, 2025. Epub Jun 25, 2025.

Abstract

BackgroundArtificial intelligence (AI), particularly large language models (LLMs), has gained attention for its clinical applications. While LLMs have shown utility in various medical fields, their performance in inguinal hernia repair (IHR) remains understudied. This study seeks to evaluate the accuracy and readability of LLM-generated responses to IHR-related questions, as well as their performance across distinct clinical categories.MethodsThirty questions were developed based on clinical guidelines for IHR and categorized into four subgroups: diagnosis, perioperative care, surgical management, and other. Questions were entered into Microsoft Copilot®, Google Gemini®, and OpenAI ChatGPT-4®. Responses were anonymized and evaluated by six fellowship-trained, minimally invasive surgeons using a validated 5-point Likert scale. Readability was assessed with six validated formulae.ResultsGPT-4 and Gemini outperformed Copilot in overall mean scores for response accuracy (Copilot: 3.75 ± 0.99, Gemini: 4.35 ± 0.82, and GPT-4: 4.30 ± 0.89; P < 0.001). Subgroup analysis revealed significantly higher scores for Gemini and GPT-4 in perioperative care (P = 0.025) and surgical management (P < 0.001). Readability scores were comparable across models, with all responses at college to college-graduate reading levels.DiscussionThis study highlights the variability in LLM performance, with GPT-4 and Gemini producing higher-quality responses than Copilot for IHR-related questions. However, the consistently high reading level of responses may limit accessibility for patients. These findings underscore the potential of LLMs to serve as valuable adjunct tools in surgical practice, with ongoing advancements expected to further enhance their accuracy, readability, and applicability.

PMID:
40557497
Bibliographic data and abstract were imported from PubMed on 25 Jun 2025.

Read full publication at:
Please sign in to see all details.

Sign up!

Did you like this publication? Sign up with Life Science Network.
If you already have a Life Science Network account, sign in, or connect with LinkedIn, Google.

Stats

Community rating n/a 0 votes
Reviewers' rating n/a 0 votes

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

Recommendations n/a n/a positive of 0 vote(s)
Views 24
Comments 0

Comments

There are no comments yet.

Authors

Published in

Abstract

Sign up!

Stats

Recommended by

Post a comment

Comments