Table of contents
Data Dictionary v1
File Tree
datasets/
└── mrkr-emory-xray/
├── v1/
│ ├── MRKR_CPT.csv
│ ├── MRKR_CPT_dictionary.csv
│ ├── MRKR_ICD.csv
│ ├── MRKR_ICD_dictionary.csv
│ ├── MRKR_pain.csv
│ ├── MRKR_demographics.csv
│ ├── MRKR_image_metadata.csv
└── xray/
├── ...
...
Patient Demographics
- Filename:
MRKR_demographics.csv
- Rows: 83,011
- Columns: 4
This files contains patient demographics, indexed at the patient level.
Field Name | Data Type | Description |
---|---|---|
empi_anon | integer | Unique patient identification number (8 digits) |
sex | nominal string | Patient sex male female
|
race | nominal string | Patient self-reported raceAfrican American or Black American Indian or Alaskan Native Asian Caucasian or White Multiple Native Hawaiian or Other Pacific Islander Unknown
|
ethnicity | nominal string | Patient self-reported ethnicityHispanic patients Non-Hispanic patients Unknown Unreported
|
X-ray Image Metadata
- Filename:
MRKR_image_metadata.csv
- Rows: 503,261
- Columns: 19
This file contains relevant public DICOM metadata tags that may be helpful for identifying images. Patient and exam identifiers are replaced with de-identified versions in this table and within DICOM files. Other Non-PHI containing metadata tags that are not in this table are left intact within DICOM files. Fields containing PHI such as patient name, addresses, or referring physician are removed from this table and DICOM files. For data curation, the below fields were modified or added.
Field Name | Data Type | Description |
---|---|---|
empi_anon | integer | Unique patient identification number (8 digits) |
StudyInstanceUID_anon | string | De-identified Study UID, shared between all images in the same study |
SeriesInstanceUID_anon | string | De-identified Series UID, shared between all images in the same series |
SOPInstanceUID_anon | string | De-identified SOP Instance UID which corresponds to a single DICOM image |
img_height | integer | Image pixel height |
img_width | integer | Image pixel width |
laterality | nominal string | Laterality of the image, as inferred by DL modelR : RightL : LeftB : Bilateral-1 : Unknown or not present |
view_position | nominal string | Anatomical projection of radiograph, as inferred by DL modelF : FrontalL : LateralS : SunriseI : Internal ObliqueE : External Oblique |
horizontal_flip | binary | Indicates if the patient’s left side was oriented to the left side of the image, which is opposite of typical radiographic orientation, as inferred by DL model. |
weight_bearing | binary | Indicates if the radiograph was weight-bearing as indicated by a marker and derived by DL model. Not all images in a given exam will be weight-bearing or non-weightbearing. |
inverted | binary | Indicates whether pixel intensity values are inverted from typical radiographic convention, as inferred by DL model. |
arthroplasty | nominal string | Indicates if image contains a knee arthroplasty and its laterality, as derived by DL model. Applies to unilateral and bilateral view.R : RightL : LeftB : BilateralNL : Unknown (no laterality marker)NaN : No arthroplasty |
L_KLG_inference | ordinal integer | KLG score of left knee in a bilateral knee radiograph, inferred by DL model [0-4] |
R_KLG_inference | ordinal integer | KLG score of right knee in a bilateral knee radiograph, inferred by DL model [0-4] |
SeriesDescription | string | DICOM Metadata describing the series |
StudyDescription | string | DICOM metadata describing the study |
StudyDate_anon | date | De-identified date of radiograph |
age_at_exam | integer | Age of the patient when the radiograph was performed |
dicom_path | string | Path to DICOM file |
Pain Scores
- Filename:
MRKR_pain.csv
- Rows: 4,975,933
- Columns: 6
This file contains information on self-reported pain scores by patients during any, encounter, inculding outpatient, emergency, and perioperative. Pain scores related to knees are curated.
Field Name | Data Type | Description |
---|---|---|
empi_anon | integer | Unique patient identification number (8 digits) |
pain_location | string | Raw, uncurated strings of pain locations entered by staff. Approximately 75% of entries are blank. |
knee_pain | binary | Curated using regular expressions to identify if the pain_location is definitely knee related. |
pain_score | integer | 0 - 10 pain score |
laterality | nominal string |
R : RightL : LeftB : BilateralNaN : Unknown or not presentValue only present if knee_pain is 1 |
date_anon | date | Date of when the pain score was entered into the patient’s chart. |
Procedure Records
- Filename:
MRKR_CPT.csv
- Rows: 6,216,190
- Columns: 5
This file contains procedural codes for patients with corresponding dates.
Field Name | Data Type | Description |
---|---|---|
empi_anon | integer | Unique patient identification number (8 digits) |
cpt_code | string | Current Procedural Terminology code used in coding of medical services and procedures for billing (5 characters) 7,166 CPT unique codes |
cpt_group_modifier | string | Used to provide further information regarding service or procedure. Most CPT codes do not include modifier data. If there is modifier data, it is often used to indicate the laterality of a procedure (left or right). There can be multiple modifiers for a single CPT code entry. This field combines all of the modifiers for that particular CPT code for that specific encounter for that patient. |
date_anon | date | Date of when the associated procedure or service occurred. |
age_at_procedure | integer | Age when the procedure was performed |
Procedure Code Descriptions
- Filename:
MRKR_CPT_dictionary.csv
- Rows: 7,166
- Columns: 2
This file contains descriptions of each Current Procedural Terminology (CPT) code found in MRKR_CPT.csv
table.
Field Name | Data Type | Description |
---|---|---|
cpt_code | string | Current Procedural Terminology code used in coding of medical services and procedures for billing - (5 characters) |
cpt_description | string | Description of the procedure. There are some unique CPT codes that share the same description. This leads to 128 fewer unique descriptions compared to unique CPT codes. |
Diagnosis Records
- Filename:
MRKR_ICD.csv
- Rows: 21,956,056
- Columns: 16
This file contains diagnosis codes for patients with corresponding dates. Certain diseases of interest are indicated by binary flags to ease data cleaning.
Field Name | Data Type | Description |
---|---|---|
empi_anon | integer | Unique patient identification number (8 digits) |
ICD9 | string | International Classification of Diseases - 9 12,418 unque codes |
ICD10 | string | International Classification of Diseases - 10 26,963 unique codes |
date_anon | date | Date when the diagnosis code was entered. |
age_at_dx | integer | Age when the diagnosis was recorded. |
DX_LINE | string |
Primary , Secondary , Active , Not Recorded , Resolved , Canceled , Inactive
|
DX_ICD_SCOPE | string |
Billing Diagnosis , Discharge Diagnosis , Admitting Diagnosis , Referring Diagnosis , Not Recorded , Reason For Visit , Problem List , Working Diagnosis , Other Diagnosis , Final , Pre-Op Diagnosis , Post-Op Diagnosis , Principal Diagnosis , Suggested Billing
|
autoimmune | binary | If ICD code corresponds to auto-immune disease such as rheumatoid arthritis, juvenile arthritis, gout, etc. |
diabetes | binary | If ICD code corresponds to type I or type II diabetes |
hypertension | binary | If ICD code corresponds to hypertension |
joint_infection | binary | If ICD code corresponds to a knee joint infection |
knee_osteoarthritis | binary | If ICD code corresponds to knee osteoarthritis |
knee_osteomyelitis | binary | If ICD code corresponds to knee osteomyelitis |
obesity | binary | If ICD code corresponds to knee obesity |
nicotine_use | binary | If ICD code corresponds to nicotine dependence |
trauma_lower_extremity | binary | If ICD code corresponds to lower extremity |
Below is a description of the diseases of interest from the diagnosis table. This table shows the ICD9 and ICD10 prefixes used to flag these diseases along with frequency they are appear in the diagnoses codes.
DX Category | DX Entries in Category | Patients with DX in Category | ICD10 Prefixes | ICD9 Prefixes |
---|---|---|---|---|
autoimmune | 122,859 | 9,704 (11.7%) | M05, M06, M08, M10, M45, M1A | 274, 714, 720 |
diabetes | 552,208 | 18,655 (22.47%) | E08, E10, E11, E13 | 250 |
hypertension | 1,209,572 | 45,300 (54.57%) | I10-I16 | 401-405 |
joint_infection | 2,224 | 551 (0.66%) | M00.06, M00.16, M00.26, M00.86, M01.X6, M02.06, M02.16, M02.26, M02.36, M02.86 | 711.06, 711.16, 711.26, 711.36, 711.46, 711.56, 711.66, 711.76, 711.86, 711.96 |
knee_osteoarthritis | 373,186 | 51,468 (62.0%) | M17 | 715.16, 715.26, 715.36, 715.96 |
knee_osteomyelitis | 7,130 | 1,389 (1.7%) | M86 related to thigh and tibia/fibula | 730 related to lower leg |
obesity | 197,094 | 27,576 (33.2%) | E66.01, E66.09, E66.1, E66.3, E66.8, E66.9 | 278.0, 278.00, 278.01, 278.02 |
nicotine_use | 148,095 | 20,882 (25.2%) | F17, Z57.31, Z71.6, Z72.0, Z77.22, Z87.891 | 305.1 |
trauma_lower_extremity | 152,377 | 34,230 (41.2%) | S82.0, S82.1, S83 | 959.7, 844, 823, 891, 822, 836 |
Diagnosis Code Descriptions
- Filename:
MRKR_ICD_dictionary.csv
- Rows: 25,209
- Columns: 3
This file contains the description of the ICD9 and ICD10 codes found inMRKR_ICD.csv
table.
Field Name | Data Type | Description |
---|---|---|
ICD9 | string | ICD9 code |
ICD10 | string | ICD10 code |
DX_NAME | string | Diagnosis name or description |