Frontiers | Evaluating Artificial Intelligence and Large Language Models in Infectious Disease and Critical Care: Performance, Safety, and Responsible Clinical Use

About this Research Topic

Submission deadlines

Manuscript Submission Deadline 8 December 2026
This Research Topic is currently accepting articles
1. Check author guidelines

Background

Artificial intelligence tools and large language models are being deployed in infectious disease and critical care at a rate that outpaces the evidence base for their safety. Clinicians in these settings encounter AI-generated outputs — diagnostic suggestions, antimicrobial recommendations, sepsis risk scores — from systems whose performance has rarely been evaluated against the guidelines they are required to apply. Where clinical decisions are time-critical and directly consequential, this gap carries direct implications for patient safety.

Infectious disease and critical care share characteristics that render AI evaluation both uniquely important and uniquely challenging: presentations are frequently complex, atypical, and evolving; guidelines require integration with local resistance patterns and patient-specific factors; and the consequences of an incorrect or overconfident recommendation — a missed resistant organism, an inappropriate antimicrobial selection, a delayed escalation — are severe. These represent precisely the conditions under which AI failure modes are both most consequential and least studied.

Although a substantial body of work applies AI and LLMs in these settings, rigorous evaluation of whether these tools perform in accordance with clinical standards — benchmarked against current guidelines, tested across diverse populations, and transparent regarding failure modes — remains underrepresented. This Research Topic addresses that gap, focusing on the systematic evaluation of AI and LLM performance, reliability, and safety in clinical practice, and on the frameworks required to determine when AI outputs can and cannot be acted upon.

This Research Topic does not constitute a venue for clinician training in AI tool use, model development, or algorithm benchmarking in isolation, but is directed at the practising clinician and clinical researcher who requires rigorous methods to evaluate whether a tool performs safely in their setting — and the appropriate frameworks for clinical response when it does not.

We invite contributions including but not limited to:

1) Evaluation of LLM and AI performance in infectious disease: diagnostic accuracy, guideline concordance, hallucination, overconfidence, and failure mode analysis
2) AI and LLM evaluation in critical care and sepsis: performance in time-critical decision-making, prognostic scoring, and escalation support
3) Antimicrobial stewardship and AI: performance, limitations, and reliability of tools supporting prescribing, de-escalation, and resistance management
4) AI and genomic approaches to antimicrobial resistance: capabilities, limitations, and evaluation frameworks for clinical use
5) Critical appraisal of AI evidence: study design, data leakage, external validity, and generalisability across populations and settings
6) Model trustworthiness at the point of care: calibration, uncertainty quantification, and frameworks for determining when AI outputs should be acted upon
7) Real-world implementation: workflow integration, clinician override patterns, human-AI disagreement, and the limits of AI in high-acuity settings
8) Legal, ethical, and professional accountability: governance, liability, and institutional responsibility when AI informs clinical decisions
9) Case-based analysis: scenarios in which AI outputs conflict with clinical judgment, local guidelines, or patient-specific context

Article types and fees

This Research Topic accepts the following article types, unless otherwise specified in the Research Topic description:

Brief Research Report
Case Report
Classification
Clinical Trial
Community Case Study
Curriculum, Instruction, and Pedagogy
Data Report
Editorial
FAIR² Data

Articles that are accepted for publication by our external editors following rigorous peer review incur a publishing fee charged to Authors, institutions, or funders.

Keywords: Artificial Intelligence, Large Language Models, Critical Care Medicine, Antimicrobial Stewardship, Clinical Validation, Sepsis, Generative AI, Real-world Evidence, Hallucination, Guideline Concordance, LLM Evaluation, Clinical Decision Support, AI Safety, Infectious Diseases, Antimicrobial Resistance

Important note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

Topic editors

Topic coordinators

GS
giovanni scaglione
Luigi Sacco Hospital
Milan, Italy

Share on

Frontiers in Digital Health

Health Informatics
Ethical Digital Health
Health Technology Implementation

Manuscripts can be submitted to this Research Topic via the main journal or any other participating journal.

Impact

139Topic views

View impact

Evaluating Artificial Intelligence and Large Language Models in Infectious Disease and Critical Care: Performance, Safety, and Responsible Clinical Use

About this Research Topic

Background

Article types and fees

Topic editors

fabio borgonovo

giorgia carra

Topic coordinators

giovanni scaglione

Frontiers in Digital Health

Health Informatics

Ethical Digital Health

Health Technology Implementation