Authors
Tomomi Shimazaki, Masanori Tachikawa
Published in
Physical chemistry chemical physics : PCCP. Jul 01, 2026. Epub Jul 01, 2026.
Abstract
In this study, we combined an interpretable machine learning (ML) framework with a large language model (LLM) to investigate structure-reactivity trends in an acrylate/methacrylate radical reaction dataset constructed from density functional theory calculations. For the ML component, we employed modified convex clustering (regression) with direct representative selection (DRS) and direct representative prediction (DRP). Within this framework, the model selected representative samples from the training set (DRS) and formed predictions as weighted sums over these representatives (DRP). Consequently, this DRS/DRP design yielded instance-level interpretability and facilitated the extraction of chemically meaningful insights. In prior studies, these patterns were interpreted by human experts. In the present study, we introduced the LLM as an assistive interpreter and demonstrated that both chemical framing (prompt design) and model size systematically shaped the depth of mechanistic insight. Notably, the LLM is not intended to uncover entirely new mechanisms, but rather to assist human interpretation by providing alternative perspectives, which may help reveal implicit cognitive biases and support more balanced mechanistic reasoning. Specifically, stronger framing and larger models elicited more mechanism-oriented reasoning, whereas weaker framing or smaller models produced concise but more surface-level summaries. Altogether, DRS/DRP enabled a two-layer interpretability framework that linked quantitative attribution from the interpretable ML layer (modified convex regression) with the LLM's linguistic, mechanism-oriented analysis, thereby enabling structured extraction of physicochemical insights from datasets. Within this framework, mechanistic interpretations are systematically structured and accumulated with LLM assistance, providing a pathway toward future knowledge discovery.
PMID:
42383338
Bibliographic data and abstract were imported from PubMed on 01 Jul 2026.
Read full publication at:
Please sign in
to see all details.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 6
- Comments 0