To what Degree can LLMs Support Medical Informatics Research? Examining the Interplay of Research Support LLMs with LLM Critics

Abstract

The rapid development of Large Language Models (LLMs) has opened up new possibilities for their role in supporting research. This study assesses whether LLMs can generate “thoughtful” research plans in the domain of Medical Informatics and whether LLM-generated critiques can improve such plans. Using an LLM pipeline, we prompt four LLMs to generate primary research plans. Subsequently, these plans are mutually critiqued and then the LLMs are prompted to refine their outputs based on these critiques. These original and improved responses are then reviewed by human evaluators for errors, hallucinations, etc. We employ ROUGE scores, cosine similarity, and length differences to quantify similarities across responses. Our findings reveal variations in outputs among four LLMs, the impact of critiques, and differences between primary and secondary outputs. All LLMs produce cogent outputs and critiques, integrating feedback when generating improved outputs. Human evaluators can distinguish between primary and secondary responses in most cases.

Publication
In American Medical Informatics Association Annual Symposium2025
Naren Khatwani
Naren Khatwani
Ph.D Student and Researcher