Large Language Model-Supported Data Collation: Addressing Accuracy and Reproducibility

Quant Med. 2026;1(1):16-26. Online ahead of print.

Sellen S, Sjögren E.

AutomationImmunologyJournalMethodology

Can we fully trust AI for absolute accuracy in scientific data extraction?

Manual data collation from regulatory documents is a time-consuming bottleneck in drug development. In this recent publication in the inaugural issue of Quantitative Medicine, the authors explore the potential of using Large Language Models, specifically Google’s Gemini-2.5-flash model, to automate the collation of complex clinical data. Using immunogenicity data (ADA incidence) for 50 monoclonal antibodies as a case study, distinct workflows were tested in the R programming environment to determine whether LLMs might offer sufficient precision for life science research.

Highlights:

The workflow matters: Direct PDF analysis and grounded web searches fell short due to data misassociation and dynamic search variability. Preprocessing documents into concise Markdown reports successfully resolved data mix-ups and achieved highly accurate core data extraction.
The “reproducibility” hurdle: Even with fixed model parameters and identical, highly structured inputs, the LLM exhibited day-to-day variability in presenting contextual details.
Factual vs. reproducible: The study highlights a crucial distinction in AI tools—a model can be remarkably accurate (factually correct) without being perfectly reproducible (consistent over time).

The bottom line: While generative AI can be a valuable tool to accelerate initial data gathering, it is not yet a substitute for careful human verification. For research workflows that require consistent results, maintaining a “human-in-the-loop” remains advisable.

Pharmetheus Affiliates

Principal Director, PBPK & PBBM Scientific Lead

Erik Sjögren

See bio