Data dictionary v1
Table of contents
File tree
.
└── silent-cchs-ecg
├── v1
│ ├── ecg-waveforms
│ │ ├── 00
│ │ │ ├── 00123...npz
│ │ │ ├── 00124...npz
│ │ │ ...
│ │ ...
│ ├── ecg-tables
│ │ ├── ecg-metadata.csv
│ │ ├── ecg-statement.csv
│ │ ├── lead-availability.csv
│ │ └── lead-measurement.csv
│ └── ehr-tables
│ ├── cardiology-order.csv
│ ├── diagnosis.csv
│ ├── encounter-ed.csv
│ ├── encounter-inpatient.csv
│ ├── encounter-outpatient.csv
│ ├── lab.csv
│ ├── medication.csv
│ ├── patient.csv
│ ├── procedure.csv
│ └── vital-measurement.csv
│
└── v0
└── ...
ECG Waveforms
v1/ecg-waveforms/{first two digits of ecg_id}/{ecd_id}.npz
The ECG waveforms were shared with us as XML files. These XML files were parsed and stored in NumPy arrays. Each 12-lead ECG is stored as an array named waveform
in a separate compressed NumPy file. The file names contain the ecg_id
of the ECG.
The shape of each array is (12,5500). Each waveform should have 5,500 sample points. If there are less then 5,500 point the array is filled with zeroes.
Sample rate: 500 Hz
Lead amplitude units: Microvolts [uV]
To load the waveform array, use NumPy.
>>> import numpy as np
>>> data = np.load(ecg_filepath)
>>> waveform = data['waveform']
>>> waveform.shape
(12, 5500)
ECG Metadata
ecg-metadata.csv
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
ecg_id | GUID uniquely identifying an ECG | string | 123abc-aaa-123b-c98765 |
acquired_datetime | Date and time when the ECG was acquired at the ECG machine | datetime | 2013-01-01 12:12:12 |
received_datetime | Date and time when the ECG was received | datetime | 2013-01-01 12:15:00 |
modified_datetime | Date and time when the ECG was modified | datetime | 2013-01-01 12:20:00 |
confirmed_datetime | Date and time when the ECG was confirmed | datetime | 2013-01-01 19:45:00 |
heart_rate | Heart rate at time of ECG acquisition | int | 70 |
pr_interval | PR interval | int | 190 |
qrs_duration | QRS duration | int | 110 |
qt_interval | QT interval | int | 410 |
qtcb | QT Bazett’s corrected | int | 445 |
p_front_axis | P front axis | int | 40 |
i40_front_axis | i40 front axis | int | 45 |
t40_front_axis | t40 front axis | int | 92 |
qrs_front_axis | QRS front axis | int | 35 |
st_front_axis | ST front axis | int | 60 |
t_front_axis | T front axis | int | 105 |
p_horiz_axis | P horizontal axis | int | -15 |
i40_horiz_axis | i40 horizontal axis | int | 10 |
t40_horiz_axis | t40 horizontal axis | int | -45 |
qrs_horiz_axis | QRS horizontal axis | int | -35 |
st_horiz_axis | ST horizontal axis | int | 100 |
t_horiz_axis | T horizontal axis | int | 33 |
rr_interval | RR interval | int | 750 |
p_duration | P duration | int | 100 |
q_onset | Q onset | int | 512 |
qtcf | QT Fridericia’s corrected | int | 400 |
- Rows: 43,700
- Columns: 27
ECG Statement
ecg-statement.csv
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
ecg_id | GUID uniquely identifying an ECG | string | 123abc-aaa-123b-c98765 |
acquired_datetime | Date and time when the ECG was acquired at the ECG machine | datetime | 2013-01-01 12:12:12 |
interpretation_info | Statement in plain text | string | Sinus rhythm |
statement_code | Statement code | string | SR |
criteria_version | Version of the criteria used for ECG analysis | string | 0B |
statement_number | Statement number | int | 1 |
- Rows: 97,839
- Columns: 7
Lead Measurements
lead-measurement.csv
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
ecg_id | GUID uniquely identifying an ECG | string | 123abc-aaa-123b-c98765 |
lead_name | Name of ECG lead | string | V1 |
p_amp | P wave amplitude | int | 70 |
p_dur | P wave duration | int | 100 |
p_area | P wave area | int | 5 |
pp_amp | P’ wave amplitude | int | 0 |
pp_dur | P’ wave duration | int | 0 |
pppp_dur | P plus P’ duration | int | 100 |
pp_area | P’ wave area | int | 0 |
pppp_area | P plus P’ area | int | 5 |
q_amp | Q wave amplitude | int | 0 |
q_dur | Q wave duration | int | 0 |
r_amp | R wave amplitude | int | 1250 |
r_dur | R wave duration | int | 50 |
s_amp | S wave amplitude | int | 0 |
s_dur | S wave duration | int | 0 |
rp_amp | R’ wave amplitude | int | 0 |
rp_dur | R’ wave duration | int | 0 |
sp_amp | S’ wave amplitude | int | 0 |
sp_dur | S’ wave duration | int | 0 |
vat | Ventricular activation time. The interval from the onset of the QRS complex to the latest positive peak in the complex, or the latest substantial notch on the latest peak (whichever is later) | int | 40 |
qrs_ppk | Peak-to-peak QRS complex amplitude | int | 2000 |
qrs_dur | QRS complex duration | int | 50 |
qrs_area | QRS Area | int | 100 |
st_on | Elevation or depression at the onset (J point) of the ST segment | int | 40 |
st_mid | Elevation or depression at the midpoint of the ST segment | int | 80 |
st_80 | Elevation or depression of the ST segment 80 ms after the end of the QRS complex (J point) | int | 50 |
st_end | Elevation or depression at the end of the ST segment | int | 80 |
st_dur | ST segment duration | int | 120 |
st_slope | ST segment slope | int | 25 |
st_shape | ST segment shape | string | Straight |
t_amp | T wave amplitude | int | 300 |
t_dur | T wave duration | int | 190 |
t_area | T wave area | int | 70 |
tp_amp | T’ wave amplitude | int | 0 |
tptp_dur | T plus T’ duration | int | 0 |
tp_dur | T’ wave duration | int | 0 |
tp_area | T’ wave area | int | 0 |
tptp_area | T plus T’ area | int | 50 |
pr_seg | Interval from the onset of the P wave to the onset of the QRS complex | int | 170 |
qt_int | Interval from the end of the P wave to the onset of the QRS complex | int | 90 |
pr_int | Interval from the onset of the QRS complex to the end of the T wave | int | 400 |
- Rows: 524,376
- Columns: 43
Lead Availability
lead-availability.csv
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
ecg_id | GUID uniquely identifying an ECG | string | 123abc-aaa-123b-c98765 |
lead_i | Whether the lead is present for the ECG
| int | 1 |
lead_ii | int | 1 | |
lead_iii | int | 1 | |
lead_avf | int | 1 | |
lead_avl | int | 1 | |
lead_avr | int | 1 | |
lead_v1 | int | 1 | |
lead_v2 | int | 1 | |
lead_v3 | int | 1 | |
lead_v4 | int | 1 | |
lead_v5 | int | 1 | |
lead_v6 | int | 1 | |
available_lead_count | The number of Leads available | int | 12 |
- Rows: 43,700
- Columns: 15
Electric Health Record Tables
Cardiology Order
cardiology-order.csv
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
prc_name | Description of procedure order | string | ECHOCARDIOGRAM 2D COMPLETE |
prc_date | Date of procedure | date | 2113-01-01 |
is_echo | Whether this cardiology study is an echocardiogram | int | 1 |
is_stress_test | Whether this cardiology study is a stress test | int | 0 |
is_cardiac_monitor | Whether this cardiology study is a cardiac monitor | int | 0 |
is_bubble | Whether this cardiology study is a bubble study | int | 0 |
- Rows: 28,243
- Columns: 7
Diagnosis
diagnosis.csv
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
disch_datetime | Datetime of discharge | datetime | 2114-01-01 12:10:00 |
dia_codetype | Standardized system of codes for clinical diagnoses | string | ICD9 |
dia_code | Code assigned to the diagnosis | string | 611.72 |
dia_name | Long description of diagnosis | string | Lump or mass in breast |
primary_flg | Whether this diagnosis is the condition responsible for patient encounter | int | 1 |
- Rows: 4,429,580
- Columns: 6
Encounter Emergency Department
encounter-ed.csv
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
hosp_admit_datetime | Date of hospital admission | datetime | 2114-01-01 12:10:00 |
hosp_disch_datetime | Date of hospital discharge | datetime | 2114-01-05 12:10:00 |
ed_dispo | ED discharge disposition | string | Discharge |
disch_dispo | Discharge disposition | string | HOME/SELF CARE |
chief_complaint | ED primary reason for encounter | string | Abdominal Pain |
longest_ed_attend_prov_id | Unique ID of the attending provider who had the most time assigned to the patient as an ED attending | string | 345abc67 |
first_ed_attend_prov_id | Unique ID of the ED attending provider who was first assigned to the patient | string | 345abc67 |
first_ed_dept | Name of emergency department the patient was roomed in | string | RMC EMERGENCY DEPARTMENT |
- Rows: 110,924
- Columns: 9
Encounter Inpatient
encounter-inpatient.csv
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
ip_disch_dept | Where the patient was released after inpatient stay | string | HOME/SELF F/U @CCRMC |
ip_admit_dept | Name of the department the patient was admitted to | string | RMC 4B MEDICAL/SURGICAL/ TELEMETRY |
ip_admit_datetime | Date of inpatient admission | date | 2117-01-01 |
hosp_disch_datetime | Date of hospital discharge | date | 2117-01-02 |
- Rows: 23,219
- Columns: 5
Encounter Outpatient
encounter-outpatient.csv
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
visit_date | Date of visit | date | 2111-07-05 |
visit_type | Type of visit | string | Office Visit |
visit_prov_id | Unique ID for the visit provider associated with this encounter | string | 345abc67 |
dept_name | Name of the department | string | FAMILY PRACTICE |
dept_specialty | Name of the medical specialty practiced in this department | string | Family Medicine |
reason_visit_name | Reason for encounter | string | FOLLOW-UP |
- Rows: 2,139,111
- Columns: 7
Lab
lab.csv
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
lab_date | Date of laboratory result | date | 2113-07-01 |
name | Name of lab | string | CREATININE, SERUM |
loinc | LOINC identifier for laboratory test | string | 2160-0 |
lab_value | Result of the lab that can be a value or interpretation | string | NEGATIVE |
- Rows: 9,388,071
- Columns: 5
Medication
medication.csv
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
admin_flg | Whether this medication was prescribed or administered | int | 0 |
med_date | Date of medication | date | 2113-07-01 |
med_name | Name of medication | string | ACETAMINOPHEN 325 MG TABLET |
med_codetype | Standardized system of codes for medications | string | NDC |
med_code | Code assigned to medication | string | 173068220 |
- Rows: 7,550,621
- Columns: 6
Patient
patient.csv
We do not have patient age.
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
sex | Patient sex | string | Female |
death_date | Patient date of death | date | 2119-04-08 |
race | Patient race | string | White |
- Rows: 45,474
- Columns: 4
Procedure
procedure.csv
Procedures are limited to stress-test specific outcomes of interest (cath, stress, …)
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
visit_date | Date of visit | date | 2113-07-01 |
prc_date | Date of procedure | date | 2113-07-01 |
prc_name | Name of procedure | string | HC CARDIOVASCULAR STRESS TEST TRACING ONLY |
prc_code | Code assigned to procedure | string | 93017 |
prc_codetype | Standardized system of codes for clinical procedures | string | CPT |
- Rows: 74,947
- Columns: 6
Vital Measurement
vital-measurement.csv
Column Name | Description | Data Type | Sample Data |
---|---|---|---|
patient_ngsci_id | Unique patient identifier | string | 123abc45 |
meas_datetime | Date of vital sign recording | string | 2113-07-01 12:10:00 |
meas_name | Name of vital sign | string | Temp |
meas_value | Measured value | float | 98 |
- Rows: 16,089,189
- Columns: 4