Subtyping cardiac arrest with ECG waveforms: A Nightingale Open Science dataset

Authors: Chien-Hua Huang1, Randy Su1, Hui-Chun Huang1, Katie Lin2, Nick Foster2, Nathan Juergens2, Josh Risley2, Katy Haynes2, Ziad Obermeyer2,3

1 National Taiwan University Hospital
2 Nightingale Open Science
3 University of California, Berkeley

Lead Nightingale analyst: Nick Foster, Nathan Juergens

When using this resource, please cite: more options
Chien Hua Huang, Randy Su, Hui Chun Huang, Katie Lin, Nick Foster, Nathan Juergens, Josh Risley, Katy Haynes, and Ziad Obermeyer. 2021. Subtyping Cardiac Arrest with ECG Waveforms: A Nightingale Open Science Dataset. DOI:

Additionally, please cite: more options
Sendhil Mullainathan and Ziad Obermeyer. 2022. Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine 28, 5 (May 2022), 897–899. DOI:

The problem

A patient is rushed into the ER, unconscious and in cardiac arrest. As the physician begins the resuscitation, she knows only that the patient’s heart has stopped—but nothing else. What happened to cause the arrest? What immediate actions need to be taken? And what will happen to the patient?

One of the only pieces of data available to the emergency physician in this situation is the electrocardiogram (ECG), which measures the electrical activity of the heart. Physicians use this to determine which immediate actions are needed: most importantly, does the patient need to be shocked (cardioverted), or need some critical medication to restart the heart.

This rich signal might also contain other clues: about why the heart stopped, what physicians can do in the ER to give the patient the best possible chance of surviving, and the likelihood that a patient who survives will have a normal life, without profound physical or neurological impairments.

Dataset overview

Over the past 10 years, the National Taiwan University Hospital (NTUH) has captured and stored ECG waveforms from all adult (over 20 years old) Emergency Department (ED) patients in cardiac arrest: those brought in by ambulance in cardiac arrest, and those experiencing cardiac arrest in the ED or the waiting room. Research staff would identify patients in the ED with cardiac arrest, and physicians would then review the medical records and enter the patient into a ‘registry’ dataset, using an Utstein-style reporting template. The team would also monitor the patient’s course in the ED and the hospital, to enter a variety of other outcome data: the patient’s survival and neurological function, and what the eventual cause of the arrest was determined to be, by the doctors who cared for the patient in the hospital.

Our partners

Established in 1895, National Taiwan University Hospital (NTUH) is a massive, 2,300-bed hospital located in Taipei. It’s the largest hospital in Taiwan, with 1,300 full-time physicians. Its busy emergency department sees approximately 100,000 patients every year. It’s the leading hospital for cardiovascular treatment in Taiwan, treating around 3,400 cardiac catheterization cases every year.

This dataset was conceived of and created by Dr. Chien-Hua Huang, Chairman and clinical professor in the Department of Emergency Medicine at NTUH, with invaluable assistance from Dr. Hui-Chun Huang and Dr. Randy Su. The emergency department sees a large number of cardiac arrest cases, and began to gather these data, along with ECGs, 10 years ago in order to improve the quality of care and patient outcomes. This dataset is part of their team’s commitment to improvement: we believe it holds the promise of providing more insight into why cardiac arrests happen, and what physicians can do to increase both the quantity and quality of life for patients suffering cardiac arrest.

Dataset details


The dataset v1: This dataset contains the ECGs and outcomes of all patients appearing in the registry from 2011 to 2019. (Because patients who do not survive the arrest do not tend to have ECGs stored, the dataset only contains patients whose hearts did restart [return of spontaneous circulation, ROSC] in the ED.) It also includes the ECGs of a set of control patients who visited the ED on the same day as the cases, and had an ECG, but did not have a cardiac arrest. (There is currently no available clinical outcome data for controls, such as death or unfavorable neurologic outcome.)

Each observation in the dataset corresponds to a single 12-lead ECG. All ECGs in the NTUH ECG database were identified for patients who appeared in the cardiac arrest registry. ECGs were then added for control patients, defined as patients without cardiac arrest who presented to the ED on the same date as cases and received an ECG. ECG waveforms were shared with us as an XML file, which we parsed into an array of 5,500 points for each one of twelve leads.

After identifying both cardiac arrest patients and controls during their ED visit, we also obtained a prior ECG in the NTUH system that was temporarily closest to the index cardiac arrest ED visit, as well as an ECG 24 hours following ROSC for those same patients.

For the patients on the registry, there are 1,686 ECGs from 974 unique patients. For the corresponding control patients, there are 16,386 ECGs from 9,976 unique patients. The ECGs for this control group are from 2015 to 2019.

What’s next for v1.1: In the next version of this dataset, we’ll expand the set of control patients, with a view to creating a dataset that might one day help predict which patients will eventually sustain a cardiac arrest. We’ll also add additional prior ECGs from the NTUH system for each patient, as well as refresh the dataset by adding new cases for 2020 to 2021.

Dataset schema

Dataset Observations Connection to Key Outcomes
Dataset construction and key outcome variables are shown in the diagram above. A note on color choices: the burnt siena (orange) indicates the node that corresponds to the observations (rows) in the dataset, and the grape (purple) indicates key patient outcomes.

Key variables

Cause of arrest:

Patients who experienced cardiac arrest and ROSC in the ED were identified each day by a clinical research nurse. The appropriateness of their inclusion was confirmed with electronic medical record chart review by project physicians. The cause of cardiac arrest was also determined by chart review.

v1 Cause of cardiac arrest for ROSC patients

Cause of Cardiac Arrest % of Total Cases
Total Cases 977
Cardiac Event 46%
Respiratory Event 30%
Infectious Event 13%
Neurology 3.8%
GI Bleeding 3.6%
Unknown 3.1%
Others 0.92%

Neurological outcome:

Each patient’s cerebral performance category (CPC) is recorded upon discharge from the ICU and discharge from the hospital. There are five categories as listed below. A score of 3 or greater is considered a poor neurological outcome.

v1 CPC at ICU and hospital discharge

Cerebral Performance Category At ICU Discharge At Hospital Discharge
Cases with CPC 370 330
[1]Good cerebral capability 51% 60%
[2]Moderate cerebral disability 13% 12%
[3]Severe cerebral disability 15% 13%
[4]Coma 21% 15%
[5]Brain Death 0% 0%


Similarly, the patient’s survival is recorded upon discharge from the ICU and discharge from the hospital.

v1 ROSC patient survival

  Survival at ICU Discharge Survival at Hospital Discharge
Died 607 639
Survived 370 338

Table of contents

Copyright © 2021-2023 Nightingale Open Science. All rights reserved.