Discover how to access the datasets through Globus. Follow the instructions here.
Welcome to Nightingale Open Science
The Datasets
Waveforms
-
ed-bwh-ecg
: Assessing Heart Attack Risk (104,000 ECG waveforms) -
silent-cchs-ecg
: Diagnosing ‘Silent’ Heart Attack (48,000 ECG waveforms) -
arrest-ntuh-ecg
: Subtyping Cardiac Arrest (24,106 ECGs waveforms) -
mcmed-stanford-multi
: Multimodal Clinical Monitoring in the ED (1,000,000 Waveforms)
Microscopy Images
-
brca-psj-path
: Identifying High-Risk Breast Cancer (175,000 Biopsy Slides) -
tb-wellgen-smear
: Detecting Active Tuberculosis (75,000 TB Smear Images)
X-ray Images
-
mrkr-emory-xray
: Emory Knee Radiograph (500,000 Knee X-ray Images) -
fracture-aimi-xray
: Predicting Fractures (224,000 Chest X-ray Images) -
covid-psj-xray
: Emergency Triage of Covid-19 Patients (27,500 Chest X-ray Images)
Multiple Diagnostics
-
tamil-jpal-multi
: Tamil Nadu J-PAL Data Dictionary (82,000 Diagnostic Images)
What makes these datasets special
Our datasets are curated around medical mysteries—heart attack, cancer metastasis, cardiac arrest, bone aging, Covid-19—where machine learning can be transformative. We designed these datasets with four key principles in mind:
-
The core of each dataset is a large collection of medical images: x-rays, ECG waveforms, digital pathology (and more to come). These rich, high-dimensional signals are too complex for humans to fully see or process—so machine vision can add huge value.
-
Each image is linked to at least one ground truth outcome: data on what happened to the patient, not a doctor’s interpretation of the image. This allows researchers to build algorithms that learn from nature—not from humans.
-
The data are diverse: we work with health systems across the US and the world, including under-resourced ones whose data aren’t usually represented in machine learning. This lets the resulting algorithm speak to the needs of diverse populations.
-
Access is secure and ethical: all data are completely deidentified, and as an extra precaution, no download is allowed. Only non-commercial use is allowed, so the knowledge generated from the data benefits everyone.