Evaluating Artificial Intelligence and Large Language Models in Infectious Disease and Critical Care: Performance, Safety, and Responsible Clinical Use
Evaluating Artificial Intelligence and Large Language Models in Infectious Disease and Critical Care: Performance, Safety, and Responsible Clinical Use
Artificial intelligence tools and large language models are being deployed in infectious disease and critical care at a rate that outpaces the evidence base for their safety. Clinicians in these settings encounter AI-generated outputs — diagnostic suggestions, antimicrobial recommendations, sepsis risk scores — from systems whose performance has rarely been evaluated against the guidelines they are required to apply. Where clinical decisions are time-critical and directly consequential, this gap carries direct implications for patient safety.
Infectious disease and critical care share characteristics that render AI evaluation both uniquely important and uniquely challenging: presentations are frequently complex, atypical, and evolving; guidelines require integration with local resistance patterns and patient-specific factors; and the consequences of an incorrect or overconfident recommendation — a missed resistant organism, an inappropriate antimicrobial selection, a delayed escalation — are severe. These represent precisely the conditions under which AI failure modes are both most consequential and least studied.
Although a substantial body of work applies AI and LLMs in these settings, rigorous evaluation of whether these tools perform in accordance with clinical standards — benchmarked against current guidelines, tested across diverse populations, and transparent regarding failure modes — remains underrepresented. This Research Topic addresses that gap, focusing on the systematic evaluation of AI and LLM performance, reliability, and safety in clinical practice, and on the frameworks required to determine when AI outputs can and cannot be acted upon.
This Research Topic does not constitute a venue for clinician training in AI tool use, model development, or algorithm benchmarking in isolation, but is directed at the practising clinician and clinical researcher who requires rigorous methods to evaluate whether a tool performs safely in their setting — and the appropriate frameworks for clinical response when it does not.
We invite contributions including but not limited to:
1) Evaluation of LLM and AI performance in infectious disease: diagnostic accuracy, guideline concordance, hallucination, overconfidence, and failure mode analysis 2) AI and LLM evaluation in critical care and sepsis: performance in time-critical decision-making, prognostic scoring, and escalation support 3) Antimicrobial stewardship and AI: performance, limitations, and reliability of tools supporting prescribing, de-escalation, and resistance management 4) AI and genomic approaches to antimicrobial resistance: capabilities, limitations, and evaluation frameworks for clinical use 5) Critical appraisal of AI evidence: study design, data leakage, external validity, and generalisability across populations and settings 6) Model trustworthiness at the point of care: calibration, uncertainty quantification, and frameworks for determining when AI outputs should be acted upon 7) Real-world implementation: workflow integration, clinician override patterns, human-AI disagreement, and the limits of AI in high-acuity settings 8) Legal, ethical, and professional accountability: governance, liability, and institutional responsibility when AI informs clinical decisions 9) Case-based analysis: scenarios in which AI outputs conflict with clinical judgment, local guidelines, or patient-specific context
Article types and fees
This Research Topic accepts the following article types, unless otherwise specified in the Research Topic description:
Brief Research Report
Case Report
Classification
Clinical Trial
Community Case Study
Curriculum, Instruction, and Pedagogy
Data Report
Editorial
FAIR² Data
Articles that are accepted for publication by our external editors following rigorous peer review incur a publishing fee charged to Authors, institutions, or funders.
Article types
This Research Topic accepts the following article types, unless otherwise specified in the Research Topic description:
Brief Research Report
Case Report
Classification
Clinical Trial
Community Case Study
Curriculum, Instruction, and Pedagogy
Data Report
Editorial
FAIR² Data
FAIR² DATA Direct Submission
General Commentary
Hypothesis and Theory
Methods
Mini Review
Opinion
Original Research
Perspective
Policy and Practice Reviews
Policy Brief
Review
Study Protocol
Systematic Review
Technology and Code
Keywords: Artificial Intelligence, Large Language Models, Critical Care Medicine, Antimicrobial Stewardship, Clinical Validation, Sepsis, Generative AI, Real-world Evidence, Hallucination, Guideline Concordance, LLM Evaluation, Clinical Decision Support, AI Safety, Infectious Diseases, Antimicrobial Resistance
Important note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.