dataset versions

Emergency triage of Covid-19 patients using chest X-rays: A Nightingale Open Science dataset

Authors: Ari Robicsek1, Angelique Russell1, Todd Czartoski1, George Diaz1, JB Minogue1, Per E. Danielsson2, Michael Pirri2, Kristen Manning2, Katie Lin3, William Lane3, Josh Risley3, Katy Haynes3, Ziad Obermeyer3,4

1 Providence St. Joseph
2 Swedish Health Services
3 Nightingale Open Science
4 University of California, Berkeley

Lead Nightingale analyst: William Lane

When using this resource, please cite: more options
Ari Robicsek, Angelique Russell, Todd Czartoski, George Diaz, JB Minogue, Per E. Danielsson, Michael Pirri, Kristen Manning, Katie Lin, William Lane, Josh Risley, Katy Haynes, and Ziad Obermeyer. 2021. Emergency Triage of Covid-19 Patients Using Chest X-rays: A Nightingale Open Science Dataset. DOI:

Additionally, please cite: more options
Sendhil Mullainathan and Ziad Obermeyer. 2022. Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine 28, 5 (May 2022), 897–899. DOI:

The problem

In emergency rooms across the world, doctors facing hospital bed shortages must make a difficult judgment call: is a patient with respiratory infection safe to go home? Or is close monitoring in the hospital, or even the ICU, needed? Getting this right is critical not just to save lives, but also to optimize scarce hospital resources.

Reports from the front lines of the Covid-19 pandemic indicate that the current state of medical knowledge is failing here. Empirically, many patients are admitted to the hospital, but ultimately do not require advanced care—a waste of beds. Other patients look well enough to be sent home, only to deteriorate rapidly, returning to the ER in profound respiratory distress—or not returning at all.

The key to solving this problem could lie in the chest x-ray, a rapid, cheap diagnostic that nearly all patients with respiratory complaints get in the ER. It’s clear to front-line doctors that there is a signal in the x-ray image for predicting impending pulmonary collapse. But this signal can be devilishly hard to find. Indeed, some health systems explicitly require senior physicians to personally review x-rays before a patient is sent home from the ER, in the hope that expending their scarcest resource—doctors’ time—can help catch high-risk patients in time.

Dataset overview

This dataset links chest x-rays to pulmonary outcomes, in order to fill an urgent need identified by clinicians: an algorithm that helps physicians make good triage decisions, by predicting pulmonary collapse on the basis of x-rays done in the ER. This is directly motivated by the Covid-19 pandemic, but if it works, it could help a range of other patients with respiratory infections that progress via the same “final common pathway”—acute respiratory distress syndrome (ARDS): influenza, pneumonia, sepsis, and non-infectious inflammatory conditions. The dataset consists of patients who were received in the Emergency Department (ED) of a participating hospital and received a chest X-ray and either a positive COVID-19 diagnosis (via physician) or a positive test (rapid, antibody, or PCR) within fourteen days of their ED visit date. We begin by identifying chest x-rays performed in the ER across the 51 hospitals at Providence St. Joseph, in patients diagnosed with Covid-19. We then determine whether the patient was admitted to the hospital or not, reflecting the emergency physician’s triage decision: did the patient need close monitoring as an inpatient, or were they safe to go home? Finally, we obtain two critical outcomes: whether the patient ultimately required mechanical ventilation over the 14 days after the initial visit; and did they die over the same period.

Of note, we will only observe mechanical ventilation if the patient returns to the same hospital (or another hospital within the Providence St. Joseph system). However, because we obtain mortality data from linkage to Social Security records, the label is not dependent on seeking care.

Our partner

Providence St. Joseph Health is a not-for-profit health care system operating in seven states and serves as the parent organization for 100,000 caregivers. The combined system includes 51 hospitals, 829 physician clinics, and other health, education and social services across Washington, Oregon, California, Alaska, Montana, New Mexico, and Texas. This dataset was conceived of and created by Ari Robicsek, Chief Medical Analytics Officer, along with colleagues in infectious disease and radiology. These clinicians are true heroes: in addition to working long hours in the hospital at the height of the pandemic, they also devoted time to in-depth discussions with our team on how algorithms could be used to improve their performance in trying times.


dataset versions

This dataset v2: The current (v2) dataset contains 41,333 chest x-rays linked to the triage decision, mechanical ventilation, ICU decision, and mortality. Each observation in the dataset corresponds to an x-ray from a Covid-19 patient. We begin by querying the Ambra picture archiving and communication system (PACS) system for chest x-rays performed in ERs. Because this involves a large volume of studies, we then take a 50% random sample of these studies, and link them to diagnoses of Covid-19 from the Providence Covid-19 internal registry. Of note, this contains both patients diagnosed early in the pandemic, on the basis of symptomatology, geography, and time; and patients diagnosed on the basis of PCR or antigen tests. We then link these x-rays to ICD-9 procedure codes on ventilation, as well as Social Security data on mortality.

v1.1: 15,000
v1: 7,533

What’s next for v2 (target release date: August 2022): We plan to augment the dataset with additional data elements from the electronic health record, including other diagnoses, labs, and vital signs from the ER visit. We will also add data on the source of the diagnosis for Covid-19 patients (symptoms, PCR, antigen testing), as well as additional observations on Covid-19 negative patients.

Dataset schema

Dataset Observations Connection to Key Outcomes
Dataset construction and key outcome variables are shown in the diagram above. A note on color choices: the burnt siena (orange) indicates the node that corresponds to the observations (rows) in the dataset, and the grape (purple) indicates key patient outcomes.

We summarize the dataset construction and key variables in a diagram that transparently shows (i) where the data come from and (ii) what are the key outcomes (labels) relevant to the medical problem we are trying to solve. A note on color choices: the burnt siena (orange) indicates the node that corresponds to the observations (rows) in the dataset, and the grape (purple) indicates the key patient.

  Admit Discharge
N 11,946 21,745
Intubate 19.86% 3.86%
ICU 28.05% 3.63%
Mortality 20.99% 6.50%

Admission: Indicates admission to the hospital from the ED. We coded this variable by looking at the ‘Encounters’ table, where each ‘ED episode ID’ has a ‘Patient Class’ which contains the flag ‘Admitted’. Intubation: Procedures for Covid-19 patients were entered in the line, drain, airway (LDA) table, drawn from the hospital electronic health record. We queried this table and coded intubation if the patient had a matching record in the the procedure was marked as ‘airway’. Mortality: Indicates patient death, any time following ED visit. This variable is merged in from Social Security data via the electronic health record.

Table of contents

Copyright © 2021-2023 Nightingale Open Science. All rights reserved.