Table of contents
  1. Patient Demographics
  2. X-ray Image Metadata
  3. Pain Scores
  4. Procedure Records
  5. Procedure Code Descriptions
  6. Diagnosis Records
  7. Diagnosis Code Descriptions

Data Dictionary v1

File Tree

datasets/
└── mrkr-emory-xray/
    ├── v1/
    │   ├── MRKR_CPT.csv
    │   ├── MRKR_CPT_dictionary.csv
    │   ├── MRKR_ICD.csv
    │   ├── MRKR_ICD_dictionary.csv
    │   ├── MRKR_pain.csv
    │   ├── MRKR_demographics.csv
    │   ├── MRKR_image_metadata.csv
    └── xray/
        ├── ...
        ...

Patient Demographics

  • Filename: MRKR_demographics.csv
  • Rows: 83,011
  • Columns: 4

This files contains patient demographics, indexed at the patient level.

Field Name Data Type Description
empi_anon integer Unique patient identification number (8 digits)
sex nominal string Patient sex
male
female
race nominal string Patient self-reported race
African American or Black
American Indian or Alaskan Native
Asian
Caucasian or White
Multiple
Native Hawaiian or Other Pacific Islander
Unknown
ethnicity nominal string Patient self-reported ethnicity
Hispanic patients
Non-Hispanic patients
Unknown
Unreported

X-ray Image Metadata

  • Filename: MRKR_image_metadata.csv
  • Rows: 503,261
  • Columns: 19

This file contains relevant public DICOM metadata tags that may be helpful for identifying images. Patient and exam identifiers are replaced with de-identified versions in this table and within DICOM files. Other Non-PHI containing metadata tags that are not in this table are left intact within DICOM files. Fields containing PHI such as patient name, addresses, or referring physician are removed from this table and DICOM files. For data curation, the below fields were modified or added.

Field Name Data Type Description
empi_anon integer Unique patient identification number (8 digits)
StudyInstanceUID_anon string De-identified Study UID, shared between all images in the same study
SeriesInstanceUID_anon string De-identified Series UID, shared between all images in the same series
SOPInstanceUID_anon string De-identified SOP Instance UID which corresponds to a single DICOM image
img_height integer Image pixel height
img_width integer Image pixel width
laterality nominal string Laterality of the image, as inferred by DL model
R: Right
L: Left
B: Bilateral
-1: Unknown or not present
view_position nominal string Anatomical projection of radiograph, as inferred by DL model
F: Frontal
L: Lateral
S: Sunrise
I: Internal Oblique
E: External Oblique
horizontal_flip binary Indicates if the patient’s left side was oriented to the left side of the image, which is opposite of typical radiographic orientation, as inferred by DL model.
weight_bearing binary Indicates if the radiograph was weight-bearing as indicated by a marker and derived by DL model. Not all images in a given exam will be weight-bearing or non-weightbearing.
inverted binary Indicates whether pixel intensity values are inverted from typical radiographic convention, as inferred by DL model.
arthroplasty nominal string Indicates if image contains a knee arthroplasty and its laterality, as derived by DL model. Applies to unilateral and bilateral view.
R: Right
L: Left
B: Bilateral
NL: Unknown (no laterality marker)
NaN: No arthroplasty
L_KLG_inference ordinal integer KLG score of left knee in a bilateral knee radiograph, inferred by DL model [0-4]
R_KLG_inference ordinal integer KLG score of right knee in a bilateral knee radiograph, inferred by DL model [0-4]
SeriesDescription string DICOM Metadata describing the series
StudyDescription string DICOM metadata describing the study
StudyDate_anon date De-identified date of radiograph
age_at_exam integer Age of the patient when the radiograph was performed
dicom_path string Path to DICOM file

Pain Scores

  • Filename: MRKR_pain.csv
  • Rows: 4,975,933
  • Columns: 6

This file contains information on self-reported pain scores by patients during any, encounter, inculding outpatient, emergency, and perioperative. Pain scores related to knees are curated.

Field Name Data Type Description
empi_anon integer Unique patient identification number (8 digits)
pain_location string Raw, uncurated strings of pain locations entered by staff.
Approximately 75% of entries are blank.
knee_pain binary Curated using regular expressions to identify if the pain_location is definitely knee related.
pain_score integer 0 - 10 pain score
laterality nominal string R: Right
L: Left
B: Bilateral
NaN: Unknown or not present
Value only present if knee_pain is 1
date_anon date Date of when the pain score was entered into the patient’s chart.

Procedure Records

  • Filename: MRKR_CPT.csv
  • Rows: 6,216,190
  • Columns: 5

This file contains procedural codes for patients with corresponding dates.

Field Name Data Type Description
empi_anon integer Unique patient identification number (8 digits)
cpt_code string Current Procedural Terminology code used in coding of medical services and procedures for billing (5 characters)
7,166 CPT unique codes
cpt_group_modifier string Used to provide further information regarding service or procedure. Most CPT codes do not include modifier data. If there is modifier data, it is often used to indicate the laterality of a procedure (left or right). There can be multiple modifiers for a single CPT code entry. This field combines all of the modifiers for that particular CPT code for that specific encounter for that patient.
date_anon date Date of when the associated procedure or service occurred.
age_at_procedure integer Age when the procedure was performed

Procedure Code Descriptions

  • Filename: MRKR_CPT_dictionary.csv
  • Rows: 7,166
  • Columns: 2

This file contains descriptions of each Current Procedural Terminology (CPT) code found in MRKR_CPT.csv table.

Field Name Data Type Description
cpt_code string Current Procedural Terminology code used in coding of medical services and procedures for billing - (5 characters)
cpt_description string Description of the procedure. There are some unique CPT codes that share the same description. This leads to 128 fewer unique descriptions compared to unique CPT codes.

Diagnosis Records

  • Filename: MRKR_ICD.csv
  • Rows: 21,956,056
  • Columns: 16

This file contains diagnosis codes for patients with corresponding dates. Certain diseases of interest are indicated by binary flags to ease data cleaning.

Field Name Data Type Description
empi_anon integer Unique patient identification number (8 digits)
ICD9 string International Classification of Diseases - 9
12,418 unque codes
ICD10 string International Classification of Diseases - 10
26,963 unique codes
date_anon date Date when the diagnosis code was entered.
age_at_dx integer Age when the diagnosis was recorded.
DX_LINE string Primary, Secondary, Active, Not Recorded, Resolved, Canceled, Inactive
DX_ICD_SCOPE string Billing Diagnosis, Discharge Diagnosis, Admitting Diagnosis, Referring Diagnosis, Not Recorded, Reason For Visit, Problem List, Working Diagnosis, Other Diagnosis, Final, Pre-Op Diagnosis, Post-Op Diagnosis, Principal Diagnosis, Suggested Billing
autoimmune binary If ICD code corresponds to auto-immune disease such as rheumatoid arthritis, juvenile arthritis, gout, etc.
diabetes binary If ICD code corresponds to type I or type II diabetes
hypertension binary If ICD code corresponds to hypertension
joint_infection binary If ICD code corresponds to a knee joint infection
knee_osteoarthritis binary If ICD code corresponds to knee osteoarthritis
knee_osteomyelitis binary If ICD code corresponds to knee osteomyelitis
obesity binary If ICD code corresponds to knee obesity
nicotine_use binary If ICD code corresponds to nicotine dependence
trauma_lower_extremity binary If ICD code corresponds to lower extremity

Below is a description of the diseases of interest from the diagnosis table. This table shows the ICD9 and ICD10 prefixes used to flag these diseases along with frequency they are appear in the diagnoses codes.

DX Category DX Entries in Category Patients with DX in Category ICD10 Prefixes ICD9 Prefixes
autoimmune 122,859 9,704 (11.7%) M05, M06, M08, M10, M45, M1A 274, 714, 720
diabetes 552,208 18,655 (22.47%) E08, E10, E11, E13 250
hypertension 1,209,572 45,300 (54.57%) I10-I16 401-405
joint_infection 2,224 551 (0.66%) M00.06, M00.16, M00.26, M00.86, M01.X6, M02.06, M02.16, M02.26, M02.36, M02.86 711.06, 711.16, 711.26, 711.36, 711.46, 711.56, 711.66, 711.76, 711.86, 711.96
knee_osteoarthritis 373,186 51,468 (62.0%) M17 715.16, 715.26, 715.36, 715.96
knee_osteomyelitis 7,130 1,389 (1.7%) M86 related to thigh and tibia/fibula 730 related to lower leg
obesity 197,094 27,576 (33.2%) E66.01, E66.09, E66.1, E66.3, E66.8, E66.9 278.0, 278.00, 278.01, 278.02
nicotine_use 148,095 20,882 (25.2%) F17, Z57.31, Z71.6, Z72.0, Z77.22, Z87.891 305.1
trauma_lower_extremity 152,377 34,230 (41.2%) S82.0, S82.1, S83 959.7, 844, 823, 891, 822, 836

Diagnosis Code Descriptions

  • Filename: MRKR_ICD_dictionary.csv
  • Rows: 25,209
  • Columns: 3

This file contains the description of the ICD9 and ICD10 codes found inMRKR_ICD.csv table.

Field Name Data Type Description
ICD9 string ICD9 code
ICD10 string ICD10 code
DX_NAME string Diagnosis name or description

Copyright © 2021-2023 Nightingale Open Science. All rights reserved.