Data dictionary v1

Table of contents
  1. File tree
  2. ECG Waveforms
  3. ECG Metadata
  4. ECG Statement
  5. Lead Measurements
  6. Lead Availability
  7. Electric Health Record Tables
    1. Cardiology Order
    2. Diagnosis
    3. Encounter Emergency Department
    4. Encounter Inpatient
    5. Encounter Outpatient
    6. Lab
    7. Medication
    8. Patient
    9. Procedure
    10. Vital Measurement

File tree

└── silent-cchs-ecg
    ├── v1
    │   ├── ecg-waveforms
    │   │   ├── 00
    │   │   │   ├── 00123...npz
    │   │   │   ├── 00124...npz
    │   │   │   ...
    │   │   ...
    │   ├── ecg-tables
    │   │   ├── ecg-metadata.csv
    │   │   ├── ecg-statement.csv
    │   │   ├── lead-availability.csv
    │   │   └── lead-measurement.csv
    │   └── ehr-tables
    │       ├── cardiology-order.csv
    │       ├── diagnosis.csv
    │       ├── encounter-ed.csv
    │       ├── encounter-inpatient.csv
    │       ├── encounter-outpatient.csv
    │       ├── lab.csv
    │       ├── medication.csv
    │       ├── patient.csv
    │       ├── procedure.csv
    │       └── vital-measurement.csv
    └── v0
        └── ...

ECG Waveforms

v1/ecg-waveforms/{first two digits of ecg_id}/{ecd_id}.npz

The ECG waveforms were shared with us as XML files. These XML files were parsed and stored in NumPy arrays. Each 12-lead ECG is stored as an array named waveform in a separate compressed NumPy file. The file names contain the ecg_id of the ECG.

The shape of each array is (12,5500). Each waveform should have 5,500 sample points. If there are less then 5,500 point the array is filled with zeroes.

Sample rate: 500 Hz
Lead amplitude units: Microvolts [uV]

To load the waveform array, use NumPy.

>>> import numpy as np
>>> data = np.load(ecg_filepath)
>>> waveform = data['waveform']
>>> waveform.shape
(12, 5500)

ECG Metadata


Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
ecg_id GUID uniquely identifying an ECG string 123abc-aaa-123b-c98765
acquired_datetime Date and time when the ECG was acquired at the ECG machine datetime 2013-01-01 12:12:12
received_datetime Date and time when the ECG was received datetime 2013-01-01 12:15:00
modified_datetime Date and time when the ECG was modified datetime 2013-01-01 12:20:00
confirmed_datetime Date and time when the ECG was confirmed datetime 2013-01-01 19:45:00
heart_rate Heart rate at time of ECG acquisition int 70
pr_interval PR interval int 190
qrs_duration QRS duration int 110
qt_interval QT interval int 410
qtcb QT Bazett’s corrected int 445
p_front_axis P front axis int 40
i40_front_axis i40 front axis int 45
t40_front_axis t40 front axis int 92
qrs_front_axis QRS front axis int 35
st_front_axis ST front axis int 60
t_front_axis T front axis int 105
p_horiz_axis P horizontal axis int -15
i40_horiz_axis i40 horizontal axis int 10
t40_horiz_axis t40 horizontal axis int -45
qrs_horiz_axis QRS horizontal axis int -35
st_horiz_axis ST horizontal axis int 100
t_horiz_axis T horizontal axis int 33
rr_interval RR interval int 750
p_duration P duration int 100
q_onset Q onset int 512
qtcf QT Fridericia’s corrected int 400
  • Rows: 43,700
  • Columns: 27

ECG Statement


Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
ecg_id GUID uniquely identifying an ECG string 123abc-aaa-123b-c98765
acquired_datetime Date and time when the ECG was acquired at the ECG machine datetime 2013-01-01 12:12:12
interpretation_info Statement in plain text string Sinus rhythm
statement_code Statement code string SR
criteria_version Version of the criteria used for ECG analysis string 0B
statement_number Statement number int 1
  • Rows: 97,839
  • Columns: 7

Lead Measurements


Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
ecg_id GUID uniquely identifying an ECG string 123abc-aaa-123b-c98765
lead_name Name of ECG lead string V1
p_amp P wave amplitude int 70
p_dur P wave duration int 100
p_area P wave area int 5
pp_amp P’ wave amplitude int 0
pp_dur P’ wave duration int 0
pppp_dur P plus P’ duration int 100
pp_area P’ wave area int 0
pppp_area P plus P’ area int 5
q_amp Q wave amplitude int 0
q_dur Q wave duration int 0
r_amp R wave amplitude int 1250
r_dur R wave duration int 50
s_amp S wave amplitude int 0
s_dur S wave duration int 0
rp_amp R’ wave amplitude int 0
rp_dur R’ wave duration int 0
sp_amp S’ wave amplitude int 0
sp_dur S’ wave duration int 0
vat Ventricular activation time. The interval from the onset of the QRS complex to the latest positive peak in the complex, or the latest substantial notch on the latest peak (whichever is later) int 40
qrs_ppk Peak-to-peak QRS complex amplitude int 2000
qrs_dur QRS complex duration int 50
qrs_area QRS Area int 100
st_on Elevation or depression at the onset (J point) of the ST segment int 40
st_mid Elevation or depression at the midpoint of the ST segment int 80
st_80 Elevation or depression of the ST segment 80 ms after the end of the QRS complex (J point) int 50
st_end Elevation or depression at the end of the ST segment int 80
st_dur ST segment duration int 120
st_slope ST segment slope int 25
st_shape ST segment shape string Straight
t_amp T wave amplitude int 300
t_dur T wave duration int 190
t_area T wave area int 70
tp_amp T’ wave amplitude int 0
tptp_dur T plus T’ duration int 0
tp_dur T’ wave duration int 0
tp_area T’ wave area int 0
tptp_area T plus T’ area int 50
pr_seg Interval from the onset of the P wave to the onset of the QRS complex int 170
qt_int Interval from the end of the P wave to the onset of the QRS complex int 90
pr_int Interval from the onset of the QRS complex to the end of the T wave int 400
  • Rows: 524,376
  • Columns: 43

Lead Availability


Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
ecg_id GUID uniquely identifying an ECG string 123abc-aaa-123b-c98765
lead_i Whether the lead is present for the ECG

int 1
lead_ii int 1
lead_iii int 1
lead_avf int 1
lead_avl int 1
lead_avr int 1
lead_v1 int 1
lead_v2 int 1
lead_v3 int 1
lead_v4 int 1
lead_v5 int 1
lead_v6 int 1
available_lead_count The number of Leads available int 12
  • Rows: 43,700
  • Columns: 15

Electric Health Record Tables

Cardiology Order


Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
prc_name Description of procedure order string ECHOCARDIOGRAM 2D COMPLETE
prc_date Date of procedure date 2113-01-01
is_echo Whether this cardiology study is an echocardiogram int 1
is_stress_test Whether this cardiology study is a stress test int 0
is_cardiac_monitor Whether this cardiology study is a cardiac monitor int 0
is_bubble Whether this cardiology study is a bubble study int 0
  • Rows: 28,243
  • Columns: 7



Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
disch_datetime Datetime of discharge datetime 2114-01-01 12:10:00
dia_codetype Standardized system of codes for clinical diagnoses string ICD9
dia_code Code assigned to the diagnosis string 611.72
dia_name Long description of diagnosis string Lump or mass in breast
primary_flg Whether this diagnosis is the condition responsible for patient encounter int 1
  • Rows: 4,429,580
  • Columns: 6

Encounter Emergency Department


Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
hosp_admit_datetime Date of hospital admission datetime 2114-01-01 12:10:00
hosp_disch_datetime Date of hospital discharge datetime 2114-01-05 12:10:00
ed_dispo ED discharge disposition string Discharge
disch_dispo Discharge disposition string HOME/SELF CARE
chief_complaint ED primary reason for encounter string Abdominal Pain
longest_ed_attend_prov_id Unique ID of the attending provider who had the most time assigned to the patient as an ED attending string 345abc67
first_ed_attend_prov_id Unique ID of the ED attending provider who was first assigned to the patient string 345abc67
first_ed_dept Name of emergency department the patient was roomed in string RMC EMERGENCY DEPARTMENT
  • Rows: 110,924
  • Columns: 9

Encounter Inpatient


Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
ip_disch_dept Where the patient was released after inpatient stay string HOME/SELF F/U @CCRMC
ip_admit_dept Name of the department the patient was admitted to string RMC 4B MEDICAL/SURGICAL/ TELEMETRY
ip_admit_datetime Date of inpatient admission date 2117-01-01
hosp_disch_datetime Date of hospital discharge date 2117-01-02
  • Rows: 23,219
  • Columns: 5

Encounter Outpatient


Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
visit_date Date of visit date 2111-07-05
visit_type Type of visit string Office Visit
visit_prov_id Unique ID for the visit provider associated with this encounter string 345abc67
dept_name Name of the department string FAMILY PRACTICE
dept_specialty Name of the medical specialty practiced in this department string Family Medicine
reason_visit_name Reason for encounter string FOLLOW-UP
  • Rows: 2,139,111
  • Columns: 7



Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
lab_date Date of laboratory result date 2113-07-01
name Name of lab string CREATININE, SERUM
loinc LOINC identifier for laboratory test string 2160-0
lab_value Result of the lab that can be a value or interpretation string NEGATIVE
  • Rows: 9,388,071
  • Columns: 5



Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
admin_flg Whether this medication was prescribed or administered int 0
med_date Date of medication date 2113-07-01
med_name Name of medication string ACETAMINOPHEN 325 MG TABLET
med_codetype Standardized system of codes for medications string NDC
med_code Code assigned to medication string 173068220
  • Rows: 7,550,621
  • Columns: 6


patient.csv We do not have patient age.

Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
sex Patient sex string Female
death_date Patient date of death date 2119-04-08
race Patient race string White
  • Rows: 45,474
  • Columns: 4


procedure.csv Procedures are limited to stress-test specific outcomes of interest (cath, stress, …)

Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
visit_date Date of visit date 2113-07-01
prc_date Date of procedure date 2113-07-01
prc_name Name of procedure string HC CARDIOVASCULAR STRESS TEST TRACING ONLY
prc_code Code assigned to procedure string 93017
prc_codetype Standardized system of codes for clinical procedures string CPT
  • Rows: 74,947
  • Columns: 6

Vital Measurement


Column Name Description Data Type Sample Data
patient_ngsci_id Unique patient identifier string 123abc45
meas_datetime Date of vital sign recording string 2113-07-01 12:10:00
meas_name Name of vital sign string Temp
meas_value Measured value float 98
  • Rows: 16,089,189
  • Columns: 4

Copyright © 2021-2023 Nightingale Open Science. All rights reserved.