Assessing Heart Attack Risk in the Emergency Department Using ECG Waveforms: A Nightingale Open Science Dataset

Authors: Philip D. Anderson1, Zoey Li1, Nick Foster2, Senthil Nachimuthu2, Josh Risley2, Ziad Obermeyer2,3

1 Brigham and Women’s Hospital
2 Nightingale Open Science
3 University of California, Berkeley

Lead Nightingale analyst: Nick Foster

When using this resource, please cite: more options
Philip D. Anderson, Zoey Li, Nick Foster, Senthil Nachimuthu, Josh Risley, and Ziad Obermeyer. 2023. Assessing Heart Attack Risk in the Emergency Department Using ECG Waveforms: A Nightingale Open Science Dataset. DOI:

Additionally, please cite: more options
Sendhil Mullainathan and Ziad Obermeyer. 2022. Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine 28, 5 (May 2022), 897–899. DOI:

The Problem

A patient arrives in the ER complaining of chest pain and nausea. Should she be tested for a heart attack (technically, a new blockage in the coronary arteries)? A missed heart attack can have catastrophic consequences, but testing for it is costly and invasive. So the choice is not easy, particularly since many benign conditions (like acid reflux) share symptoms with heart attack. To make the choice, the physician must integrate a diverse set of data to predict the risk a patient is having a heart attack.

To help with this difficult decision, emergency physicians routinely use the electrocardiogram (ECG), a low-cost screening test obtained on any patient in whom the physician has the slightest suspicion of heart attack. Physicians carefully scrutinize the ECG for signs of heart attack—but the signs can be subtle. If algorithms could extract more signal from this high-dimensional waveform, it could help physicians decide which patients to send for invasive testing, and which can be safely left untested—and potentially diagnose more heart attacks at lower cost and risk to patients than the way physicians currently test.

Note: This dataset, and the medical context underlying it, are described in more detail in “Diagnosing Physician Error: A Machine Learning Approach to Low-Value Health Care”, in the Quarterly Journal of Economics[1].

Dataset Overview

To construct this dataset, we start with a sample of all visits over 2010–2015 to the Emergency Department of the Brigham and Women’s Hospital, a large, top-ranked hospital. After excluding a number of patients who may not be eligible for testing (e.g., those over 80 years old, those with a prior serious illness like cancer, etc), we restrict to the 43,451 visits by 30,933 patients who had an ECG (29% of all non-excluded visits; as noted in the paper above, this captures the majority of patients at highest risk for heart attack).

Ultimately, we would like to measure for each of these visits whether the patient was having a blockage in the coronary arteries supplying the heart at the time of the visit (an acute coronary syndrome, ACS—this is sometimes colloquially called ‘heart attack,’ but we use ‘blockage’ to refer to ACS for precision). Unfortunately, this is complicated by two problems.

  1. We do not observe blockage itself—only the result of tests for blockage. In patients who received testing we do see the test result. Note, however, that if physicians over-treat (e.g., because of incentives), or if tests are imperfect (i.e., they have false positives or negatives) the test result is not a perfect measure of ground truth.

  2. We only observe test results in patients who were tested, not in the untested. This is an example of the ‘selective labels’ problem, where the physician’s decision determines whether or not we observe a label for a given observation. For untested patients, we rely on adverse event data as another source of ground truth information: while all patients have some base rate of 30-day adverse events, untested patients with blockages will have a much higher rate. (This is why physicians test for blockage to begin with: to prevent adverse events.) Adverse event rates in the untested allows us to make some conclusions about their level of risk, in the absence of test results. In addition, we can compare the adverse event rates to objective thresholds for levels of risk that would mandate consideration of testing for blockage, based on widely implemented decision rules (e.g., the HEART score[2]), supported by recommendations from professional societies: 2% over the 30 days after visits. (Note we do not need to assume such thresholds are socially optimal; rather we can assume that physicians believe them to be optimal, and thus would not knowingly leave high-risk patients untested.)

As a result, for each of these visits, we track whether or not the patient received testing (i.e., stress tests or catheterization) for whether the test was positive, as measured by a physician deploying treatment for blockage (via stenting or open-heart surgery); and subsequent health outcomes, including adverse events.

Ideally, we could use these data to predict which patients were having a blockage on the basis of the ECG. But Each observation in the dataset corresponds to a single 12-lead ECG, performed during an ED visit. We identified all ECGs done during ED visits using the GE MUSE system. Where multiple ECGs were present, we included. Frustratingly, ECGs were stored as PDFs over that time period, which required us to manually extract the waveforms from the three ‘long’ leads (for which the full 10-second tracings were available: V1, II, V5). We extracted this at 500 samples per second, which visually corresponded to the appearance of the initial ECGs on inspection by multiple physicians. The signal samples were then smoothed with a rolling average and further sampled for a final sample rate of 100 Hz.

Our Partners

Brigham and Women’s Hospital is a world-class academic medical center based in Boston, Massachusetts. The Brigham serves patients from New England, across the United States and from 120 countries around the world. A major teaching hospital of Harvard Medical School, Brigham and Women’s Hospital has a legacy of clinical excellence that continues to grow year after year[3].

The Brigham network includes 1,200 doctors throughout New England working across 150 outpatient practices. An international leader in virtually every area of medicine, the Brigham has led numerous medical and scientific breakthroughs that have improved lives around the world[3].

We are grateful for our partnership with Brigham and Women’s Hospital and specifically for the efforts of Philip D. Anderson, MD, Director, International Collaborations, Division of International Emergency Medicine and Humanitarian Programs; and Zoey Li, the project IS Analyst.

Dataset Details


This dataset v1: This dataset contains all the Emergency Department visits between 2010-2015 when a patient received an ECG. The dataset was then split into a 75% training dataset and a 25% hold-out dataset. The training dataset is made available on our platform. The hold-out dataset is reserved for future validation purposes.

Dataset Summary

  All Included Tested Untested
All Emergency Visits 2010-2015
Visits 326,126      
Patients 157,871      
Available on Platform
Visits 71,460 43,451 4,092 39,359
Patients 44,713 30,933 3,534 28,592
ECGs 103,952 61,680 7,210 54,470
Age Mean (years) 58 50 58 49
Age Std (years) 19 16 12 17
Female 0.557 0.566 0.448 0.578
Black 0.208 0.245 0.206 0.249
Hispanic 0.144 0.185 0.138 0.189
White 0.571 0.491 0.584 0.481
Other 0.076 0.079 0.070 0.080
Key Variables
Positive Test 0.0112 0.0142 0.151 0.0
Adverse Event 0.140 0.0515 0.268 0.0289

Dataset Schema

Dataset Observations Connection to Key Outcomes
Dataset construction and key outcome variables are shown in the diagram above. A note on color choices: the burnt siena (orange) indicates the node that corresponds to the observations (rows) in the dataset, and the grape (purple) indicates key patient outcomes.

Key Variables


We define testing as procedure codes for either stress testing or catheterization in the 10-day window (inclusive) following visit. We collapse these two tests into one for simplicity, as both are intended to diagnose blockages in the coronary arteries. See the paper[1] for further rationale on this and other data definitions below.

Positive test:

To identify a positive test, we rely on the principle that a positive test implies stenting: a cardiologist should not subject a patient to the risks of emergency catheterization unless she has already decided the patient would benefit from a stent if a blockage is detected. So we identify whether there is procedure code for stenting or open-heart surgery (CABG) in the 10-day window following the visit, and if so, define this as a positive test in the tested.

Adverse events:

We identify major adverse cardiac event (MACE) after visits. The intuition is that blockages have consequences—indeed, this is why we test and treat blockages—that manifest shortly after onset. We draw on clinical literature that defines these events using the EHR, in a way that shows good agreement with expert judgment after chart review[4]. These events fall into three categories: (i) delayed diagnosis and treatment of blockage and diagnosed damage to heart muscle, which we confirm with laboratory biomarkers (positive troponin); (ii) malignant arrhythmia, which we measure using diagnosis codes and cardiopulmonary resuscitation procedures; and (iii) mortality, which we obtain via linkage to Social Security Death Index data, and thus are able to capture death both in and out of the hospital. Importantly, apart from mortality, adverse events are only measured if the patient returns to the same health system we study for care. Troponin - ST segment elevation (STE) misses a lot of acute coronary syndrome (ACS) [5]


To flag patients with contraindications—reasons they might not benefit from invasive treatment, which might mean a physician’s decision not to test is justified, even if the patient is high-risk—we first observe whether they show evidence of poor health prior to the ED visit (as described above). Second, we observe whether they show evidence of damage to heart muscle by the end of the visit: physicians can note such diagnoses, which is financially incentivized; or we can observe a positive troponin laboratory test suggestive of such problems. If either is present, we assume the physician was aware of possible blockage, but decided not to pursue it further because of a contraindication. Note that this assumes all contraindications are measured in our data.


  1. Sendhil Mullainathan and Ziad Obermeyer. 2021. Diagnosing Physician Error: A Machine Learning Approach to Low-Value Health Care. The Quarterly Journal of Economics 137, 2 (December 2021), 679–727. DOI:
  2. Barbra E Backus, A Jacob Six, Johannes C Kelder, Thomas P Mast, Frederieke van den Akker, E Gijis Mast, Stefan HJ Monnink, Rob M van Tooren, and Pieter AFM Doevendans. 2010. Chest pain in the emergency room: a multicenter validation of the HEART Score. Critical pathways in cardiology 9, 3 (2010), 164–169.
  3. About The Brigham - Brigham and Women's Hospital —
  4. Wei-Qi Wei, Qiping Feng, Peter Weeke, William Bush, Magarya S Waitara, Otito F Iwuchukwu, Dan M Roden, Russell A Wilke, Charles M Stein, and Joshua C Denny. 2014. Creation and validation of an EMR-based algorithm for identifying major adverse cardiac events while on statins. AMIA Summits on Translational Science Proceedings 2014, (2014), 112.
  5. H. Pendell Meyers, Alexander Bracey, Daniel Lee, Andrew Lichtenheld, Wei J. Li, Daniel D. Singer, Zach Rollins, Jesse A. Kane, Kenneth W. Dodd, Kristen E. Meyers, Gautam R. Shroff, Adam J. Singer, and Stephen W. Smith. 2021. Ischemic ST-Segment Depression Maximal in V1–V4 (Versus V5–V6) of Any Amplitude Is Specific for Occlusion Myocardial Infarction (Versus Nonocclusive Ischemia). Journal of the American Heart Association 10, 23 (December 2021). DOI:

Table of contents

Copyright © 2021-2023 Nightingale Open Science. All rights reserved.