Hiring in life sciences? Share your open positions with our professional community. Read more Close

Advertisement

Bridging interpretable machine learning and large language models through direct representative selection and prediction: a two-layer framework for quantitative-linguistic insight.

Created on 01 Jul 2026

Authors

Tomomi Shimazaki, Masanori Tachikawa

Published in

Physical chemistry chemical physics : PCCP. Jul 01, 2026. Epub Jul 01, 2026.

Abstract

In this study, we combined an interpretable machine learning (ML) framework with a large language model (LLM) to investigate structure-reactivity trends in an acrylate/methacrylate radical reaction dataset constructed from density functional theory calculations. For the ML component, we employed modified convex clustering (regression) with direct representative selection (DRS) and direct representative prediction (DRP). Within this framework, the model selected representative samples from the training set (DRS) and formed predictions as weighted sums over these representatives (DRP). Consequently, this DRS/DRP design yielded instance-level interpretability and facilitated the extraction of chemically meaningful insights. In prior studies, these patterns were interpreted by human experts. In the present study, we introduced the LLM as an assistive interpreter and demonstrated that both chemical framing (prompt design) and model size systematically shaped the depth of mechanistic insight. Notably, the LLM is not intended to uncover entirely new mechanisms, but rather to assist human interpretation by providing alternative perspectives, which may help reveal implicit cognitive biases and support more balanced mechanistic reasoning. Specifically, stronger framing and larger models elicited more mechanism-oriented reasoning, whereas weaker framing or smaller models produced concise but more surface-level summaries. Altogether, DRS/DRP enabled a two-layer interpretability framework that linked quantitative attribution from the interpretable ML layer (modified convex regression) with the LLM's linguistic, mechanism-oriented analysis, thereby enabling structured extraction of physicochemical insights from datasets. Within this framework, mechanistic interpretations are systematically structured and accumulated with LLM assistance, providing a pathway toward future knowledge discovery.

PMID:
42383338
Bibliographic data and abstract were imported from PubMed on 01 Jul 2026.

Read full publication at:
Please sign in to see all details.

Advertisement

Stats

  • Community rating n/a 0 votes
  • Reviewers' rating n/a 0 votes
  • Your rating

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

  • Recommendations n/a n/a positive of 0 vote(s)
  • Views 6
  • Comments 0

Recommended by

  • No recommendations yet.

Post a comment

You need to be signed in to post comments. You can sign in here.

Comments

There are no comments yet.

Advertisement