Automating standardization of prostate cancer biopsy and histopathology reports with privacy-preserving local large language models.

Authors

Rohit Malyala, Takeshi Namekawa, Anna Black, Martin Gleave, Miles Mannas

Published in

Urologic oncology. Volume 44. Issue 9. Pages 223-232. Jun 20, 2026. Epub Jun 20, 2026.

Abstract

Large-scale biomedical analysis in prostate cancer requires structured, tabular datasets, yet most clinical documentation remains in free-text format. The standard of manual data abstraction is time-consuming, error-prone, non-reproducible, and costly. We hypothesized that locally deployed, privacy-preserving large language models (LLMs) combined with traditional natural language processing (NLP) methods could automatically extract structured data from prostate biopsy procedure and pathology reports.
We deployed Mistral 7B locally to process 150 transrectal ultrasound-guided biopsy and histopathology reports; 50 for development and 100 for validation. Procedure reports were analyzed using either a single-stage prompt or a multistage, mixed LLM-NLP workflow with iterative error correction. Longer histopathology reports were structured solely using a multistage prompting strategy.
LLM-structured outputs demonstrated high concordance with human-extracted data. Single-stage analysis of procedure reports achieved 95.3% accuracy (991 correct of 1040 discrete data points) across extracted data fields. The multistage LLM-NLP pipeline reached 98.0% accuracy (1314/1341) for ultrasound procedure reports. Applied to histopathology reports, the vertically integrated approach achieved 99.6% accuracy (9110/9150) across diagnosis, grade, key histologic features, and per-core location mapping. Errors clustered in ambiguous cases involving vague descriptors or uncommon reporting structures differing from institutional documentation culture.
A locally deployed, privacy-preserving LLM can accurately and efficiently transform unstructured radiology and pathology prostate biopsy reports into structured, tabular datasets. With minor adaptation, this approach generalizes to other report types and supports scalable data engineering for clinical research, quality assurance, and machine learning model development.

PMID:
42322812
Bibliographic data and abstract were imported from PubMed on 22 Jun 2026.

Read full publication at:
Please sign in to see all details.

Sign up!

Did you like this publication? Sign up with Life Science Network.
If you already have a Life Science Network account, sign in, or connect with LinkedIn, Google.

Stats

Community rating n/a 0 votes
Reviewers' rating n/a 0 votes

1-terrible, 9-excellent. How would you rate this publication? Sign in in to submit your rating.

Recommendations n/a n/a positive of 0 vote(s)
Views 11
Comments 0

Comments

There are no comments yet.

Authors

Published in

Abstract

Sign up!

Stats

Recommended by

Post a comment

Comments