To what Degree can LLMs Support Medical Informatics Research? Examining the Interplay of Research Support LLMs with LLM Critics

Naren Khatwani, Lijing Wang, James Geller

November, 2025

Abstract

The rapid development of Large Language Models (LLMs) has opened up new possibilities for their role in supporting research. This study assesses whether LLMs can generate “thoughtful” research plans in the domain of Medical Informatics and whether LLM-generated critiques can improve such plans. Using an LLM pipeline, we prompt four LLMs to generate primary research plans. Subsequently, these plans are mutually critiqued and then the LLMs are prompted to refine their outputs based on these critiques. These original and improved responses are then reviewed by human evaluators for errors, hallucinations, etc. We employ ROUGE scores, cosine similarity, and length differences to quantify similarities across responses. Our findings reveal variations in outputs among four LLMs, the impact of critiques, and differences between primary and secondary outputs. All LLMs produce cogent outputs and critiques, integrating feedback when generating improved outputs. Human evaluators can distinguish between primary and secondary responses in most cases.

Type

Conference paper

Publication

In American Medical Informatics Association (AMIA) Annual Symposium 2025

Cite this work

Naren Khatwani, Lijing Wang, James Geller (2025). To what Degree can LLMs Support Medical Informatics Research? Examining the Interplay of Research Support LLMs with LLM Critics. In American Medical Informatics Association (AMIA) Annual Symposium 2025

To what Degree can LLMs Support Medical Informatics Research? Examining the Interplay of Research Support LLMs with LLM Critics

Abstract

Naren Khatwani

Ph.D Student and Researcher