Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

Evaluating expert decision-pattern alignment in endovascular planning for intracranial aneurysms using a multimodal large language model.

Created on 25 Jun 2026

Authors

Mustafa Demir, Yunus Yasar, Yusuf Agackaya, Yigit Serdar Simsek, Yilmaz Onal

Published in

Neuroradiology. Jun 25, 2026. Epub Jun 25, 2026.

Abstract

Endovascular treatment planning for intracranial aneurysms requires integrating vascular geometry, branch preservation, device feasibility, and patient-specific risk tolerance. Because multiple strategies may be clinically acceptable for the same anatomy, treatment selection reflects expert heuristics rather than a single deterministic solution. Whether multimodal large language models (LLMs) can reproduce such expert decision patterns from angiographic inputs remains incompletely characterized.
In this retrospective single-center study, 59 patients with unruptured intracranial aneurysms treated with endovascular therapy were analyzed. Four decision agents reviewed each case: (i) the multidisciplinary neurovascular board that performed the index procedure (contextual clinical reference), (ii) an independent interventional neuroradiologist (> 10 years EVT experience), (iii) a multimodal LLM (GPT-5), and (iv) a first-year neurointerventional fellow included as a trainee-level comparator. All evaluators received identical clinical summaries and 6-10 standardized 3D-DSA projections. The blinded expert scored all non-reference proposals on a 4-point ordinal scale that reflects procedural feasibility and alignment with heuristic preferences rather than clinical correctness. Agreement with board decisions for modality-level strategies (SAC/BAC/FD) was quantified using Cohen's kappa (κ).
GPT-5 demonstrated moderate concordance with board-selected treatment modalities (κ = 0.64), within the range of expert-level variability reported in intracranial aneurysm management and higher than agreement observed for the trainee-level comparator (κ = 0.15). Expert preference-based scoring indicated that most GPT-5 proposals were judged as either preferred strategies or clinically acceptable alternatives (median score 4 vs 3 for the fellow; p < 0.001). Divergence between evaluators occurred primarily at the strategic planning level, whereas similar performance was observed for rule-based subtasks such as device sizing and landing-zone estimation.
In this retrospective case series, a multimodal LLM generated endovascular treatment strategies that frequently aligned with expert heuristic decision-making patterns when evaluated in a real-world clinical decision-making context. Rather than representing autonomous clinical reasoning, the model appears to reproduce codified expert heuristics embedded in neurointerventional practice. Multimodal LLMs may therefore serve as adjunct tools for standardizing decision patterns and supporting trainees. Prospective multicenter validation incorporating full DICOM workflows and human-in-the-loop oversight is required before clinical integration.

PMID:
42347983
Bibliographic data and abstract were imported from PubMed on 25 Jun 2026.

Read full publication at:
Please sign in to see all details.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Reviewers' rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 4
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement