Predicting fractures and pain using chest x-rays: A Nightingale Open Science dataset

Authors: Matthew Lungren1, Johanna Kim1, Stephanie Bogdan1, William Lane2, Josh Risley2, Katy Haynes2, Ziad Obermeyer2,3

1 Stanford Center for Artificial Intelligence in Medicine and Imaging
2 Nightingale Open Science
3 University of California, Berkeley

Lead Nightingale analyst: William Lane

When using this resource, please cite: more options
Matthew Lungren, Johanna Kim, Stephanie Bogdan, William Lane, Josh Risley, Katy Haynes, and Ziad Obermeyer. 2021. Predicting Fractures and Pain Using Chest X-rays: A Nightingale Open Science Dataset. DOI:

Additionally, please cite: more options
Sendhil Mullainathan and Ziad Obermeyer. 2022. Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine 28, 5 (May 2022), 897–899. DOI:

The problem

For many older patients—and some younger ones—a fracture marks the beginning of the end. The fracture itself is seldom fatal; but it sets off a downward spiral of pain, decreased mobility, physical deconditioning, debility, and ultimately death. This is why screening for osteoporosis, recommended today for women starting at age 65, is so critical: the appearance of bones on a special type of x-ray (called a DEXA scan) shows us who is at high risk of fractures, and lets us start treatments to prevent them before they happen.

Given the massive costs of fractures—to both patients, and the health care system, which a recent report put at nearly $60 billion for fractures in US Medicare patients alone—it’s clear that our current screening strategies are not adequate. For one thing, despite established guidelines calling for universal screening over age 65, the vast majority of women don’t get it—not to mention the fact that many fractures occur in men and younger people, for whom guidelines don’t recommend screening. So it would be very useful to find another way to predict fractures at scale, using routinely available data.

The chest x-ray is, by far, the most commonly-performed radiological study in the world, done when patients see their doctor for a cough, chest or back pain, before surgery, in the ER, on admission to the hospital, and in a variety of other settings. An interesting fact about the ‘chest’ x-ray is that it also gets a very clear view of the spine, from neck to the upper lumbar area. And the spine is an excellent place to assess the quality and quantity of bone, which may hold signal for predicting future fractures.

Dataset overview

This dataset starts with the Stanford Artificial Intelligence in Medical Imaging CheXpert dataset, which contains x-rays from across the Stanford Medicine system: in outpatient clinics, in the ER, or in the hospital. As part of that initial dataset, the chest x-rays were linked to the radiologist’s interpretation of the image, which we also provide here.

But this dataset goes further, adding labels on both health outcomes and patient experiences. First, we link each x-ray to the occurrence of past and future fractures, not just in the spine, but all over the body; and to data on diagnoses of osteopenia and osteoporosis, so that researchers can compare algorithmic predictions to what doctors already know about patient risk. (We also have the CheXpert labels, so researchers can also observe whether the doctor saw a fracture in the actual chest x-ray.) We also link the x-rays to diagnoses of musculoskeletal problems (joints, tendons, pain, etc), again past and future and all over the body, to test the hypothesis that subtle features of the chest x-ray might also be able to yield insights into a range of musculoskeletal issues (as other recent articles have suggested). Finally, we also add other relevant data elements describing the patients, including height, weight, and selected vital signs.

A few notes to keep in mind. All labels—on fractures, pain, etc.—will only be present if the patient received care involving that fracture in some part of the Stanford Medicine system. This creates bias in who is labeled, since some patients who have fractures will not show up, or go elsewhere. Note also that many, but not all, x-ray studies contain two orthogonal images: the PA [postero-anterior] view taken from back to front, and the lateral view from the side. (Some patients, particularly those who are too frail or sick to stand up, receive only the AP [antero-posterio] view from front to back, while lying down in bed.) Finally, note that there can be multiple chest x-ray studies per patient, on different days.

Our partners

The Stanford Artificial Intelligence in Medical Imaging (AIMI) Center supports the development, evaluation and dissemination of new artificial intelligence methods applied across the medical imaging life cycle, in order to solve clinically important problems in medicine using AI. Their mission is to develop and support transformative medical AI applications and the latest in applied computational and biomedical imaging research to advance patient health. Building on their trailblazing work to release imaging datasets like CheXpert, this Nightingale dataset holds the promise to predict future fractures and frailty in patients, which could lead to the creation of tools for triage and diagnosis, and optimize over-burdened hospitals. This dataset was conceived of and created by Dr. Matthew Lungren and Johanna Kim, Co-Directors of the Stanford AIMI Center, as well as Stephanie Bogdan, Project Manager for the Stanford AIMI Center. We are deeply grateful for their help, as well as their inspirational work to make data available as a public good.

Dataset details


This dataset v1: Each observation in the dataset corresponds to one of 224,316 chest x-ray studies, from 65,240 unique patients between October 2002 and July 2017. The x-rays were then linked to electronic health record data from the Stanford Medicine system using patient MRN. We queried ICD diagnosis tables to obtain codes on fractures and pain over one years before and after the date of the x-ray, and patient flowsheet data to obtain data on height, weight, body temperature.

What’s next v2 (target release date: March 2022): We will add diagnosis and procedure codes that capture pulmonary deterioration in the short-term after the x-ray was done, as well as the setting of the x-ray (e.g., the ER, inpatient, clinic). This will allow researchers to predict this important outcome, and align this dataset with other Nightingale Open Science datasets that also involve prediction of pulmonary deterioration with chest x-rays.

Dataset schema

Dataset Observations Connection to Key Outcomes
Dataset construction and key outcome variables are shown in the diagram above. A note on color choices: the burnt siena (orange) indicates the node that corresponds to the observations (rows) in the dataset, and the grape (purple) indicates key patient outcomes.

Key variables


We obtained data on ICD-9 codes 800–829 (fractures) over the year before and after the x-ray. In the summary table below, diagnoses were grouped by body region, but individual ICD-9 codes are available in the dataset.

fracture_location icd9_code total_with_dx dx_year_before dx_year_after dx_year_before_or_after
skull and face 800,801,802,803,804 1303 77.44% 65.54% 83.73%
spine and ribs 805,806,807,809 4779 75.27% 68.07% 83.16%
pelvis and hip 808,820 2075 66.36% 57.64% 76.14%
scapula and clavicle 810,811 748 78.34% 72.33% 85.29%
arm 812,813,818,819 1512 56.08% 49.80% 64.75%
hand 814,815,816,817 927 46.60% 46.17% 61.06%
leg 821,822,823,824,827 1983 60.87% 57.19% 74.48%
foot 825,826 683 42.75% 40.85% 57.39%
other 828,829 492 50.00% 51.83% 70.53%

Note that fracture is one of the CheXpert labels that the radiologist can comment on in the chest x-ray interpretation, so you will also be able to know if the particular fracture that shows up in the ICD-9 code (in the electronic health/billing record) was visible and commented on in the x-ray itself.

Osteoporosis and Osteopenia

We obtained data on ICD-9 codes 733.00–733.03 for osteoporosis; and 733.09 or 733.90 for osteopenia, based on prior research. Additionally, for osteopenia, we required the text flag accompanying codes 733.09 or 733.90 to mention osteopenia (e.g., in this dataset, some patients had code 733.90 accompanied by a text flag for osteodynia, which would not be included under our definition).

icd9_code dx_name_corrected total_with_dx dx_year_before dx_year_after dx_year_before_or_after
733 Osteoporosis 3795 54.00% 42.00% 65.00%
733.9 Osteopenia 1422 26.00% 25.00% 41.00%

Muscoloskeletal problems and pain

We obtained data on ICD-9 codes 710–739 (musculoskeletal diagnoses), many of which involve pain, over the year before and after the x-ray. Again in the summary table below, we group these by clinical category, but individual ICD-9 codes are available in the dataset.

dx_name icd9_code total_with_dx dx_year_before dx_year_after dx_year_before_or_after
Connective tissue disease 710 988 67.91% 52.02% 78.04%
Infected joint 711 510 67.84% 58.63% 82.35%
Gout 712 375 53.87% 54.93% 74.40%
Rheumatoid arthritis 714 1253 60.73% 45.89% 71.35%
Osteoarthritis 715 8389 62.08% 56.96% 75.79%
Other joint problem 716,713 2973 44.37% 39.86% 59.87%
Knee problem 717 664 26.81% 25.90% 41.72%
Joint problem 719,718 13486 52.44% 53.28% 69.75%
Ankylosing spondylitis 720 289 47.06% 32.18% 57.79%
Spondylosis 721 8221 65.14% 61.31% 80.46%
Intervertebral disc problem 722 4700 56.15% 49.98% 73.04%
Neck problem 723 4850 47.46% 44.25% 62.41%
Back problem 724 10966 59.96% 54.08% 73.05%
Tendon and bursa problem 726,727 6378 35.15% 35.97% 52.37%
Ligament problem 728 5701 51.80% 50.99% 68.85%
Other disorders of soft tissues 729 15664 60.60% 60.25% 76.60%
Bone infection 730 1461 65.43% 66.53% 80.42%
Bone and cartilage problem 733,732,731 10170 58.48% 52.87% 73.80%
Scoliosis 737 2298 65.36% 56.66% 78.42%
Limb deformity 738,735,736 4482 48.22% 46.94% 66.44%
Nonallopathic lesions not elsewhere classified 739 207 54.59% 52.66% 77.29%

Physiological measurements

Finally, we obtained temperature, height, and weight from the flowsheet data collected in the course of medical encounters. Note, there are some outliers that are most likely data entry errors.

count 35450 34377 36500 34344
mean 86.60 1.71 99.37 30.18
std 24.62 0.11 0.84 50.40
min 0.22 0.13 36.70 0.083
25% 70.31 1.63 99.00 24.74
50% 83.20 1.70 99.70 28.21
75% 98.43 1.79 99.90 32.73
max 282.81 2.44 107.40 4835.96

Table of contents

Copyright © 2021-2023 Nightingale Open Science. All rights reserved.