Data Dictionary v1

Table of contents
  1. File Tree
  2. Entity Relationship Diagram
  3. Patient
  4. ECG Waveforms
  5. ECG Metadata
  6. Emergency Department Encounters

File Tree

.
└── ed-bwh-ecg
    └── v1
        ├── ecg-ed-enc.csv
        ├── ecg-metadata.csv
        ├── ecg-npy-index.csv
        ├── ed-encounter.csv
        ├── ecg-waveform.h5
        ├── ecg-waveform.npy
        └── patient.csv

Entity Relationship Diagram

erDiagram
    PATIENT {
        string patient_ngsci_id PK
        string demographics
    }
    ECG-METADATA {
        string ecg_id PK
        string patient_ngsci_id FK
        string data
    }
    ECG-NPY-INDEX {
        string ecg_id PK,FK
        int npy_index
    }
    ED-ENCOUNTER {
        string enc_id PK
        string patient_ngsci_id FK
        string data
    }
    ECG-ED-ENC {
        string ecg_id PK,FK
        string enc_id FK
    }
    ECG-WAVEFORM {}

    PATIENT ||--|{ ECG-METADATA : patient_ngsci_id
    PATIENT ||--|{ ED-ENCOUNTER : patient_ngsci_id
    ECG-METADATA ||--|| ECG-NPY-INDEX : ecg_id
    ECG-METADATA ||--o| ECG-ED-ENC : ecg_id
    ED-ENCOUNTER ||--|{ ECG-ED-ENC : enc_id
    ECG-NPY-INDEX ||--|| ECG-WAVEFORM : npy_index

Patient

patient.csv contains patient

Column Name Description Data Type Example
patient_ngsci_id Unique patient identifier
Pattern: pat{8 digit hex}
string pat089c033f
sex Patient Sex string Female
race_black Patient Race - The patient will only have 1 in one of the categories.


int 1
race_hispanic int 0
race_white int 0
race_other int 0
agi_under_25k Adjusted Gross Income (AGI) distributions from block-level census data based on patient address




float 0.36921
agi_25k_to_50k float 0.28902
agi_50k_to_75k float 0.15070
agi_75k_to_100k float 0.07804
agi_100k_to_200k float 0.09849
agi_above_200k float 0.01453
  • Rows: 44,713
  • Columns: 12

ECG Waveforms

ecg-npy-index.csv

Column Name Description Data Type Example
ecg_id Unique ECG identifier
Pattern: ecg{10 digit hex}
string ecg3df45120a4
npy_index Index of the NumPy array int 523
  • Rows: 112,900
  • Columns: 2

ecg-waveform.npy

Shape: (112900, 3, 1000)

ecg-waveform.h5

ECG Metadata

Date Shift

Dates in this dataset have been shifted by a random amount for each patient. This is done to create anonymity while preserving the temporal relationship between events for patients.

ecg-metadata.csv

Column Name Description Data Type Example
patient_ngsci_id Unique patient identifier
Pattern: pat{8 digit hex}
string pat089c033f
ecg_id Unique ECG identifier
Pattern: ecg{10 digit hex}
string ecg3df45120a4
date Shifted date and time of the ECG string 2110-07-29T11:27:56Z
p-r-t_axes P-R-T axes string 52 9 27
p_axes P axes int 52
r_axes R axes int 9
t_axes T axes int 27
pr_interval PR interval int 176
pr_interval_units PR interval units string ms
qrs_duration QRS duration int 74
qrs_duration_units QRS duration units string ms
qtqtc QTQTc string 432/413 ms
qt_interval QT interval int 432
qt_interval_units QT interval units string ms
qtc_interval QTc interval int 413
qtc_interval_units QTc interval units string ms
vent_rate Vent rate int 55
vent_rate_units Vent rate units string BPM
has_bbb Flags for whether search terms were present in the cardiology remarks



















int 0
has_afib int 0
has_st int 0
has_pacemaker int 0
has_lvh int 0
has_normal int 1
has_normal_ecg int 1
has_normal_sinus int 0
has_depress int 0
has_st_eleva int 0
has_twave int 0
has_aberran_bbb int 0
has_jpoint_repol int 0
has_jpoint_eleva int 0
has_twave_inver int 0
has_twave_abnormal int 0
has_nonspecific int 0
has_rhythm_disturbance int 0
has_prolonged_qt int 0
has_lead_reversal int 0
has_poor_or_quality int 0
  • Rows: 112,900
  • Columns: 39

Columns with names that start with has_ indicate whether certain search terms were present in the cardiology remarks. Below are the search terms each flag label.

Column Name Regex Search Terms
has_bbb bbb or bundle\s+branch\s+block
has_afib atrial\s+flutter or atrial\s+fibrillation
has_st st\s+
has_pacemaker pacemaker or paced
has_lvh lvh or ventricular\s+hypertrophy
has_normal (normal\s+sinus\s+rhythm and not abnormal\s+sinus\s+rhythm)
or (normal\s+ecg and not abnormal+ecg)
has_normal_ecg normal\s+ecg and not abnormal\s+ecg
has_normal_sinus normal\s+sinus\s+rhythm and not abnormal\s+sinus\s+rhythm
has_depress st\s*\w*\s*depress
has_st_eleva st\s*\w*\s*eleva
has_twave t.wave
has_aberran_bbb bbb or bundle\s+branch\s+block or aberran
has_jpoint_repol j\s+point or early repol
has_jpoint_eleva st\s*\w*\s*eleva or j\s+point or early repol
has_twave_inver t.wave and inter
has_twave_abnormal t.wave.abnormal
has_nonspecific nonspecific
has_rhythm_disturbance premature (atrial|ventricular)|PAC|PVC or aberran or intraventricular conduction or ectop or arrythmia or junctional or fusion complex or a-v|atrioventricular
has_prolonged_qt prolonged qt
has_lead_reversal lead reversal
has_poor_or_quality poor or quality

Emergency Department Encounters

Date Shift

Dates in this dataset have been shifted by a random amount for each patient. This is done to create anonymity while preserving the temporal relationship between events for patients.

ed-encounter.csv

Column Name Description Data Type Example
patient_ngsci_id Unique patient identifier
Pattern: pat{8 digit hex}
string pat089c033f
ed_enc_id Unique ED encounter identifier
Pattern: enc{8 digit hex}
string enc5ba023af
start_datetime Shifted start of the date and time of the ED encounter string 2110-07-29T11:06:00Z
end_datetime Shifted end of the date and time of the ED encounter string 2110-07-29T12:31:00Z
age_at_admit Patient age int 75
macetrop_030_pos Major adverse cardiovascular events (MACE) & pos troponin in 30 days after visit bool FALSE
death_030_day Death in 30 days after visit - This variable comes from Social Security Death Index data, so it captures both death in and out of the hospital. bool FALSE
macetrop_pos_or_death_030 Adverse Events (30days) bool FALSE
stent_010_day Stent within 10 days after visit bool FALSE
cabg_010_day Coronary artery bypass graft surgery (CABG) within 10 days after visit bool FALSE
stent_or_cabg_010_day Stent or CABG within 10 days bool FALSE
ami_day_of Acute myocardial infarction “heart attack” day of visit days_to_ami == 0 bool FALSE
days_to_ami Number or days to soonest AMI, missing if no AMI int 5
maxtrop_sameday Max troponin lab results on day of visit float 0.25
tn_group_sameday Categorized maxtrop_sameday into following bins
- missing
- 0
- (0-0.05]
- (0.05-0.1]
- (0.1-05]
- >0.5
string (0.1,0.5]
disch_disp Discharge code string a
disch_obs Flag for whether patient is dispatched to observation (disch_disp == e | disch_disp == edobs) bool FALSE
test_010_day Stress test or cath test within 10 days after visit bool FALSE
stress_010_day Stress testing (10days) bool FALSE
cath_010_day Catheterization (10days) bool FALSE
days_to_stress Days to earliest stress test int 1
days_to_cath Days to earliest catheterization test int 2
first_test Whether earliest test is stress or cath; if they have both we generally assume first test is stress even if this doesn’t seem true by timestamps string cath
excl_flag_c_int Flag for cardiac intervention in previous 30d bool FALSE
excl_flag_chronic Flag for chronic illness bool FALSE
excl_flag_death Flag for discharge = death bool FALSE
exclude_modeling Exclusion flag for training models = (excl_flag_c_int | excl_flag_chronic | excl_flag_death | (ami_day_of & !test_010_day)) bool FALSE
exclude Exclusion flag for analysis = (exclude_modeling | age_at_admit >= 80 | (!test_010_day & maxtrop_sameday > 0)) bool FALSE
  • Rows: 71,460
  • Columns: 29

ecg-ed-enc.csv

Column Name Description Data Type Example
ecg_id Unique ECG identifier
Pattern: ecg{10 digit hex}
string ecg3df45120a4
ed_enc_id Unique ED encounter identifier
Pattern: enc{8 digit hex}
string enc5ba023af
  • Rows: 103,952
  • Columns: 2

Copyright © 2021-2023 Nightingale Open Science. All rights reserved.