Fine-Tuning Large Language Models for Motivational Interviewing in Health Behavior Change: Development and Evaluation Study.

Authors

Runze Hu, Yang Yang, Yihang Yang, Jingqi Kong, Jiahui Luo, Wenyu Yang, Jing Chen, Jingyao Liu, Huiqun Zeng, Lei Zhang, Zheng Liu

Published in

JMIR formative research. Volume 10. Pages e89077. Jun 24, 2026. Epub Jun 24, 2026.

Abstract

Motivational interviewing (MI) is an effective counseling approach for promoting health behavior change, but its scalability is constrained by the need for highly trained human counselors. Large language models (LLMs) may provide a scalable way to support MI counseling, but evidence remains limited, especially for Chinese MI resources and evaluations based on standardized MI fidelity frameworks.
This study aimed to develop Chinese large language models for motivational interviewing (MI-LLMs) and evaluate whether MI-focused fine-tuning could improve their ability to generate counseling responses consistent with MI principles.
We first curated 5 publicly available Chinese psychological counseling datasets and assessed sampled conversations in terms of comprehensiveness, professionalism, authenticity, and safety. The 2 highest-scoring datasets, CPsyCounD and PsyDTCorpus, were selected for MI-style data construction. Using GPT-4 with a structured MI-informed prompt, we transformed 2040 multiturn counseling conversations into MI-style dialogs. Among these, 2000 dialogs were used for training and 40 for testing. Three Chinese-capable open-source LLMs (Baichuan2-7B-Chat, ChatGLM-4-9B-Chat, and Llama-3-8B-Chinese-Chat-v2) were fine-tuned with low-rank adaptation on the training dataset and were referred to as MI-LLMs. Automatic evaluation was conducted on the testing dataset using Bilingual Evaluation Understudy-4 (BLEU-4) and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. Manual evaluation was conducted using the Motivational Interviewing Treatment Integrity Coding Manual 4.2.1. Thirty simulated counseling dialogs generated by the MI-LLMs were compared with 30 real MI dialogs sampled from AnnoMI and translated into Chinese. Two trained graduate student raters coded global scores and behavior counts, from which summary scores were subsequently calculated.
In automatic evaluation, fine-tuning substantially improved BLEU-4 and ROUGE scores across all 3 models compared with the base models. In manual evaluation, the MI-LLMs achieved technical and relational global scores, as well as total MI-adherent ratios that approached those of real MI dialogs. The MI-LLM based on ChatGLM-4-9B-Chat showed the strongest overall global performance. However, MI-LLMs produced fewer complex reflections and had lower reflection-to-question ratios than real MI dialogs.
This study provides preliminary evidence that MI focused fine-tuning can help Chinese LLMs acquire core counseling behaviors consistent with MI principles. It also offers a scalable approach for constructing MI style dialog resources in Chinese. Nevertheless, current MI-LLMs should be regarded as early-stage tools for supporting, rather than replacing human counselors. Future work should expand real MI training data and strengthen the complex reflective skills of MI-LLMs. Further studies are needed to evaluate their effectiveness, acceptability, and safety in health behavior change settings in the real world.

PMID:
42341298
Bibliographic data and abstract were imported from PubMed on 25 Jun 2026.

Read full publication at:
Please sign in to see all details.

Sign up!

Did you like this publication? Sign up with Life Science Network.
If you already have a Life Science Network account, sign in, or connect with LinkedIn, Google.

Stats

Community rating n/a 0 votes
Reviewers' rating n/a 0 votes

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

Recommendations n/a n/a positive of 0 vote(s)
Views 7
Comments 0

Comments

There are no comments yet.

Authors

Published in

Abstract

Sign up!

Stats

Recommended by

Post a comment

Comments