Large Language Model-Supported Data Collation: Addressing Accuracy and Reproducibility
Can we fully trust AI for absolute accuracy in scientific data extraction?
Manual data collation from regulatory documents is a time-consuming bottleneck in drug development. In this recent publication in the inaugural issue of Quantitative Medicine, the authors explore the potential of using Large Language Models, specifically Google’s Gemini-2.5-flash model, to automate the collation of complex clinical data. Using immunogenicity data (ADA incidence) for 50 monoclonal antibodies as a case study, distinct workflows were tested in the R programming environment to determine whether LLMs might offer sufficient precision for life science research.
Highlights:
- The workflow matters: Direct PDF analysis and grounded web searches fell short due to data misassociation and dynamic search variability. Preprocessing documents into concise Markdown reports successfully resolved data mix-ups and achieved highly accurate core data extraction.
- The “reproducibility” hurdle: Even with fixed model parameters and identical, highly structured inputs, the LLM exhibited day-to-day variability in presenting contextual details.
- Factual vs. reproducible: The study highlights a crucial distinction in AI tools—a model can be remarkably accurate (factually correct) without being perfectly reproducible (consistent over time).
The bottom line: While generative AI can be a valuable tool to accelerate initial data gathering, it is not yet a substitute for careful human verification. For research workflows that require consistent results, maintaining a “human-in-the-loop” remains advisable.