Zonghai Yao

I will be on the job market starting in Fall 2025. If you believe I might be a good fit for your institution or organization, I’d love to connect—please feel free to reach out at zonghaiyao [at] umass [dot] edu.

I am a Ph.D. candidate in Computer Science at the University of Massachusetts Amherst (expected May 2026), working with Prof. Hong Yu in the UMass BioNLP Lab. I work at the intersection of AI, NLP, and health. My long-term goal is an “AI Hospital”: a set of reliable AI agents that can think and also communicate across real clinical workflows. In this vision, agents are not standalone chatbots; they are tool-using teammates that can reason over evolving patient context, coordinate across roles, and stay verifiably grounded in clinical evidence. I am especially excited about promising directions like long-horizon agent learning in realistic simulations, multimodal clinical world models, and evaluation that ties model behavior to safety- and outcome-relevant signals. Previously, I received my B.S. in Computer Science from Nankai University.

My work focuses on three connected problems:

A central focus is agents that think in high-stakes settings. I build LLM agents for evidence-based clinical reasoning, where models plan, retrieve guidelines or knowledge graphs, and justify decisions with verifiable support rather than fluent guesses. I am also deeply interested in multimodal medical agents in “multiple images” settings—where “multiple” can mean time (longitudinal imaging across a patient’s care journey, or video such as surgical procedures), space (3D volumes), and modality (e.g., X-ray, CT, MRI, polysomnography). My goal is to develop agents that can reason over these heterogeneous streams in a unified, step-by-step way.
I also study agents that communicate with patients and clinicians over long horizons. I build multi-agent simulations for discharge education, chronic disease management, and recovery support, with a focus on multi-session interactions where goals unfold over days, weeks or months. I evaluate whether agents can maintain state, adapt explanations to health literacy and emotion, and remain safe and helpful under realistic conversational drift and social pressure.
I develop methods to optimize, train, and stress-test these agents. On the system side, I design test-time orchestration loops (planning, retrieval, tool use, self-check) that make reasoning more structured and auditable. On the learning side, I use data-centric pipelines (expert annotation, human edits, synthetic data, hard-case mining) and learning from feedback (preference signals, outcome-verifiable objectives, and RL-style training in simulation) to improve faithfulness, robustness, and long-horizon behavior. Across projects, I pair these with fine-grained evaluation that diagnoses failures at the step level and under distribution shift, so we can iterate on both models and agent policies.

Selected areas & recent publications

Concretely, some areas I have been publishing on recently include:

1) Multi-agent interaction and long-horizon dialogue
I use multi-agent simulation to study long-horizon behavior, social pressure, and strategy.

ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care (AAAI 2026, AI for Social Impact)
ChatThero: A Language Agent for Recovery Support (Preprint, 2025)
A Survey on LLM-based Multi-Agent AI Hospital (Preprint, 2025)

2) Agentic reasoning with retrieval, tools, and structured inference (including multimodal “multiple” settings)
I build structured reasoning loops that combine planning, retrieval, memory, and “multiple” multimodal evidence.

Medical Thinking with Multiple Images (ICLR, 2026)
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback (NAACL 2025, Oral)
PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning (AAAI 2026)
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models (ACL 2025)
JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional QA (Preprint, 2024)

3) Patient-facing NLP and personalization for understanding
I build patient-facing systems, and I test if they improve understanding, not only text quality.

DischargeSim: A Simulation Benchmark for Educational Doctor–Patient Communication at Discharge (EMNLP 2025)
PaniniQA: Enhancing Patient Education Through Interactive Question Answering (TACL 2023)
Chatbot To Help Patients Understand Their Health (EMNLP Findings 2025)
README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP (EMNLP Findings 2024)
MedReadCtrl: Personalizing Medical Text Generation with Readability-Controlled Instruction Learning (Preprint, 2025)

4) Optimization, feedback, and fine-grained evaluation for trustworthy LLMs
I design training and evaluation that show where models fail, so fixes are specific and measurable.

Improving Summarization with Human Edits (EMNLP 2023)
SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization (EMNLP 2024)
Unveiling GPT-4V’s Hidden Challenges Behind High Accuracy on USMLE Questions: Observational Study (JMIR 2025)
From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations (EMNLP 2025, Oral)
Exploiting Tree Structure for Credit Assignment in RL Training of LLMs (Preprint, 2025)

5) Data, benchmarks, and deployment-oriented clinical NLP
I build datasets and pipelines that support scaling, testing, and real-world use in clinical NLP.

MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework (EACL, 2026)
NoteChat: A Dataset of Synthetic Doctor-Patient Conversations Conditioned on Clinical Notes (ACL Findings 2024)
BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing (JAMIA 2024)
Automated Identification of Eviction Status from Electronic Health Record Notes (JAMIA 2023)
SynthEHR-Eviction: Enhancing Eviction SDoH Detection with LLM-Augmented Synthetic EHR Data (Preprint, 2025)
Development of a Surveillance System to Identify Incidence of Evictions Among Patients in Veterans Affairs Medical Centers Across the United States (Journal of Community Health 2025)

To junior Ph.D./master/undergraduate students

If you would like to chat about your career plan or research ideas related to my research interests, please email me to schedule a meeting. I will dedicate 30 minutes to each meeting weekly. I encourage students from underrepresented groups to reach out and will prioritize these meetings.

Service

AI Conference Area Chair (2025–present)

NAACL ARR (2025), ACL ARR (2025), EMNLP ARR (2025), AACL ARR (2025), EACL ARR (2025)

AI Conference Reviewer (2023–present)

ICLR, NeurIPS, ICML, AAAI, ACL / EMNLP / NAACL / COLING / EACL (and related workshops)

Medical Journal Reviewer (2022–present)

npj Digital Medicine, npj Health Systems, Scientific Reports
Journal of Medical Internet Research (JMIR), Journal of the American Medical Informatics Association (JAMIA)
Bioinformatics, Expert Systems with Applications
BMC Medical Informatics and Decision Making, BMC Medical Research Methodology
European Heart Journal – Digital Health

AI Conference Session Chair (2025–present)

AACL (2025)

news

Jan 26, 2026	🎉 Medical Thinking with Multiple Images was accepted to ICLR 2026!!!
Jan 03, 2026	📚 MedQA-CS and MedAbstain were accepted to EACL 2026.
Nov 08, 2025	📚 ChatCLIDS and PRIME were accepted to AAAI 2026 in Main Technical and AI for Social Impact!
Nov 05, 2025	📚 Four papers were accepted to EMNLP 2025!!!
Aug 22, 2025	🎤 I gave an invited talk at VA Bedford Health Care \| Veterans Affairs: Chatbot for Behavior Change and Therapeutic Support in Addiction Recovery.
Aug 01, 2025	🏆 Received the Outstanding Research Award (Center for Biomedical and Health Research in Data Science, UMass Lowell).
Jul 29, 2025	✍️ RARE was accepted to (ACL 2025).
May 01, 2025	✍️ MCQG-SRefine was oral presented to (NAACL 2025).
Feb 07, 2025	🎉 Unveiling GPT-4V’s hidden challenges behind high accuracy on USMLE questions: Observational Study was accepted for publication in Journal of Medical Internet Research.
Feb 07, 2025	🎉 BioInstruct was accepted for publication in Journal of the American Medical Informatics Association!
Dec 09, 2024	🏆 Received the UMass Amherst Outstanding PhD Portfolio Award and became a PhD Candidate.
Nov 12, 2024	📚 Three papers were accepted to EMNLP 2024 !
Aug 11, 2024	🌴 I presented NoteChat at ACL 2024 and had a wonderful trip to Bangkok!
Jul 28, 2024	🎤 I gave an invited talk at Zhongshan Ophthalmic Center of Sun Yat-sen University: Bridging Health Literacy Gaps: Improving Patient Understanding of Electronic Health Records.
May 31, 2024	🎤 I gave an invited talk at VA Bedford Health Care \| Veterans Affairs: Helping Veterans Understand Clinical Notes with Advanced AI Systems.
Dec 14, 2023	🎉 PaniniQA was accepted to Transactions of the Association for Computational Linguistics !
Dec 06, 2023	🌟 I will present two of my recent works in Singapore: Improving Summarization with Human Edits (EMNLP) and PaniniQA (TACL).
May 02, 2023	🎉 Automated identification of eviction status from electronic health record notes was accepted for publication in Journal of the American Medical Informatics Association!