dataset versions

Diagnosing ‘silent’ heart attack using ECG waveforms: A Nightingale Open Science dataset

Authors: Rajiv Pramanik1, Bhumil Shah1, Anna Roth1, Honga Wei1, Ted Castillo1, Katie Lin2, Sachin Shah2, Stelios Serghiou2, Nick Foster2, Josh Risley2, Katy Haynes2, Ziad Obermeyer2,3

1 Contra Costa Health Services
2 Nightingale Open Science
3 University of California, Berkeley

Lead Nightingale analyst: Nick Foster

When using this resource, please cite: more options
Rajiv Pramanik, Bhumil Shah, Anna Roth, Honga Wei, Ted Castillo, Katie Lin, Sachin Shah, Stelios Serghiou, Nick Foster, Josh Risley, Katy Haynes, and Ziad Obermeyer. 2021. Diagnosing ’Silent’ Heart Attack Using ECG Waveforms: A Nightingale Open Science Dataset. DOI:https://doi.org/10.48815/N54W2V

Additionally, please cite: more options
Sendhil Mullainathan and Ziad Obermeyer. 2022. Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine 28, 5 (May 2022), 897–899. DOI:https://doi.org/10.1038/s41591-022-01804-4

The problem

Every year, millions of heart attacks happen around the world. But up to 78% of them are undiagnosed or “silent”. This means a large fraction of people with heart attack never get the cocktail of drugs known to save lives, by preventing future heart attacks and sudden death.

Today, doctors can order tests (like MRIs or ultrasounds) to diagnose patients when they suspect a prior heart attack. But the reason so many heart attacks remain silent is precisely because doctors and patients don’t even suspect a heart attack has happened.

Finding new ways to diagnose these undiagnosed heart attacks at scale could dramatically expand access to life-saving medications. And because of our close partnership with the county health system that sourced these data, algorithms developed on the platform, once validated, have a clear pathway for making it into clinical use and helping real patients.

Electrocardiograms (ECGs) are a cheap, widespread test done everywhere in the health care system: during annual checkups, ER visits, before surgical procedures, etc. Doctors have learned to diagnose some limited signs of prior heart attack on ECGs (like ‘Q waves’), but these coarse findings still miss about 80% of prior heart attacks. We know that algorithms can match human performance on ECG interpretation—but could they do better, by systematically mining ECG waveforms for signals that might identify prior heart attacks? We don’t know, because there have not historically been datasets linking ECGs to high-quality labels on prior heart attack.

Dataset overview

This dataset link 43,700 ECG waveforms from 20,160 patients to EHR data. Each observation in the dataset corresponds to a single 12-lead ECG. We identified all ECGs done as an inpatient or outpatient by the Contra Costa Health Services (CCHS) county health system between January 1st, 2013 and December 31st, 2020, using the Philips TraceMasterVue ECG Management System (now known as IntelliSpace ECG), which stores ECG data from all Philips cardiographs and bedside monitors (more details). ECG waveforms were shared with us as an XML file, which we parsed into an array of 5,500 points for each one of twelve leads. Note that, not all patients have ECGs. During the process of extracting the ECG data, only about two-thirds of the ECGs were not extracted.

Our partners

CCHS is a public, county health system that serves 190,000 people in Contra Costa County Contra Costa County, California. It comprises a federally-qualified Health Maintenance Organization (HMO) health insurer, one regional medical center, and eight health centers and clinics.

This dataset was conceived of and created by Rajiv Pramanik, CCHS Chief Medical Informatics Officer and Bhumil Shah, CCHS Chief Analytics Officer, and thanks to the leadership of Anna Roth, the Chief Executive Officer of CCHS. We think this dataset is unique because it comes from a kind of health system typically under-represented in machine learning: CCHS is not a well-resourced academic health center, or a private health systems, but rather a public, county system that cares for a variety of under-served patient populations. For this reason, this dataset—and the many like it we plan to release over the coming months—holds the promise of expanding access to high-quality medical diagnostics for traditionally under-served patients.

We are deeply grateful to the Gordon and Betty Moore Foundation, who supported this work with a grant from their Diagnostic Excellence Initiative.

Dataset details

Versions

This dataset v1: The v1 dataset links 43,700 ECG waveforms from 20,160 patients to EHR data. There are an additional 25,314 patient that we have included with the dataset. These other patients do not have ECGs waveforms included. We were unable to obtain the ECGs for these patient because of technical error in the storage of the ECGs.

Dataset Summary

  All with ECG missing ECG
ECG 43,700 43,700
Patient Demographics
Unique Patients 45,474 20,160 25,314
Female (%) 24,119 (53.0%) 10,456 (51.9%) 13,663 (54.0%)
Black (%) 7,392 (16.3%) 2,702 (13.4%) 4,690 (18.5%)
Hispanic (%) 3,598 (7.9%) 1,739 (8.6%) 1,859 (7.3%)
White (%) 15,512 (34.1%) 6,155 (30.5%) 9,357 (37.0%)
Other (%) 18,972 (41.7%) 9,564 (47.4%) 9,408 (37.2%)
Risk Factors (Any time)
Past Heart Disease (%) 7,242 (15.9%) 2,703 (13.4%) 4,539 (17.9%)
Diabetes (%) 11,067 (24.3%) 4,163 (20.7%) 6,904 (27.3%)
Hypertension (%) 21,490 (47.3%) 8,194 (40.7%) 13,296 (52.5%)
Cholesterol (%) 5,961 (13.1%) 2,175 (10.8%) 3,786 (15.0%)
Risk Factors (1 Year Prior)
Past Heart Disease (%) 1,122 (9.0%)
Diabetes (%) 2,546 (20.4%)
Hypertension (%) 4,904 (39.3%)
Cholesterol (%) 951 (7.6%)

Table of contents


Copyright © 2021-2023 Nightingale Open Science. All rights reserved.