Access this dataset through Globus. Follow the instructions here.
Data Dictionary v1
Authors: Esther Duflo1, Nikhil Kanakamedala2, Cyrus Graham Reginald2, Ankit Agarwal2, Jenny Wang1, Karyn Real1, Nick Foster3, Senthil Nachimuthu3, Josh Risley3, Ziad Obermeyer3,4
1 Massachusetts Institute of Technology
2 Abdul Latif Jameel Poverty Action Lab
3 Nightingale Open Science
4 University of California, Berkeley
When using this resource, please cite: more options
Esther Duflo, Nikhil Kanakamedala, Cyrus Graham Reginald, Ankit Agarwal, Jenny Wang, Karyn Real, Nick Foster, Senthil Nachimuthu, Josh Risley, and Ziad Obermeyer. 2024. Multimodal Health Screening of Tamil Nadu Elders: A Nightingale Open Science Dataset. DOI:https://doi.org/10.48815/N50591Additionally, please cite: more options
Sendhil Mullainathan and Ziad Obermeyer. 2022. Solving medicine’s data bottleneck: Nightingale Open Science. Nature Medicine 28, 5 (May 2022), 897–899. DOI:https://doi.org/10.1038/s41591-022-01804-4
Table of contents
File Tree
.
└── tamil-jpal-multi
├── echo
│ ├── 00
│ │ ├── 00ab...
│ │ │ ├── 1.mpg
│ ... ... ...
│
├── ecg-12-lead
│ └── waveform
│ ├── 00
│ │ ├── 00ab...npz
│ ... ...
│
├── ecg-single-lead
│ ├── 00
│ │ ├── 00ab...npz
│ ... ...
│
├── fundus
│ ├── 00
│ │ ├── 00ab...
│ │ │ ├── 1.jpg
│ ... ... ...
│
├── xray
│ ├── 00
│ │ ├── 00ab...jpg
│ ... ...
│
└── v1
├── patient-metrics.csv
├── echo-annotation.csv
├── echo-filepath.csv
├── ecg-12-lead-filepath.csv
├── ecg-single-lead-filepath.csv
├── fundus-filepath.csv
├── xray-annotation.csv
└── xray-filepath.csv
Participant Clinical Measurements and Survey Responses
patient-metrics.csv
contains the following medical data.
- Vitals - Blood Pressure, Pulse rate, Respiratory rate, SPO2
- Physical measures - Height, Weight, Waist/Hip/Mid-arm Circumference
- Fitness measures - Endurance test, Grip test
- Eye Measures - Tonometry, Fundus observations
- Cognition Score (MMSE short-form)
- Smoking Status responses
- Blood test data
- PHQ 4, Direct loneliness responses
Column Name | Description | Data Type | Example |
---|---|---|---|
patient_ngsci_id | Unique identifier for each respondent | string | 24fc123… |
year | Year of survey | int | 2023 |
verbal_consent | Whether verbal consent was provided?Yes ,No
| string | Yes |
age | Age of the respondent (value 999 represents age 90 and above) | int | 60 |
sex | Sex of the respondentFemale ,Male
| string | Female |
bp | Whether bp was measured?Yes ,No
| string | Yes |
bp_systolic, bp_diastolic | Blood pressure in mmHg | int | 80 |
pulse | Whether pulse rate was measured? Yes ,No
| string | Yes |
pulse_entry | Pulse rate in beats per min | int | 83 |
resp_rate | Whether respiratory rate was measured?Yes ,No
| string | Yes |
resp_rate_entry | Respiratory rate in breaths per min | int | 22 |
spo2 | Whether SPO2 was measured?Yes ,No
| string | Yes |
spo2_entry | SPO2 level in % | float | 98.0 |
rbs | Whether random blood sugar was measured?Yes ,No
| string | Yes |
rbs_entry | Random blood sugar in mg/dL | float | 138.0 |
height | Whether height was measured?Yes ,No
| string | Yes |
height_entry | Height in cm | float | 155.0 |
weight | Whether weight was measured?Yes ,No
| string | Yes |
weight_entry | Weight in kg | float | 60.0 |
midarm_circum | Whether MUAC was measured?Yes ,No
| string | Yes |
midarm_circum_entry | Mid-arm circumference in cm | float | 28.0 |
waist_circum | Whether waist circumference was measured?Yes ,No
| string | Yes |
waist_circum_entry | Waist circumference in cm | float | 88.0 |
hip_circum | Whether hip circumference was measured? Yes ,No
| string | Yes |
hip_circum_entry | Hip circumference in cm | float | 96.0 |
endurance_test | Whether endurance test was conducted?Yes ,No
| string | Yes |
endurance_test_entry | Endurance test result in terms of no. of unassisted stands | int | 11 |
grip_left, grip_right | Whether grip test was conducted, left and right hand?Yes ,No
| string | Yes |
grip_left_entry, grip_right_entry | Grip test reading for left and right hand in kg | float | 16.2 |
tonometry_lefteye, tonometry_righteye | Whether tonometry was conducted, left and right eye?Yes ,No
| string | Yes |
tonometry_lefteye_entry, tonometry_righteye_entry | Tonometry reading for left and right eye in mmHg | float | 17.3 |
fundus_lefteye, fundus_righteye | Whether the fundus examination was conducted, left and right eye?Yes ,No
| string | Yes |
fundus_lefteye_obs, fundus_righteye_obs | Observations of the optometrist for left and right eye | string | … |
cognition_sf | Whether MMSE short-form was conducted?Yes ,No
| string | Yes |
cognition_sf_score | Score obtained on the MMSE short-form out of 16 | int | 13 |
cognit_impaired | Whether the MMSE short-form indicated cognitive impairment?Yes ,No If the score is <8, the respondent has cognitive impairment | string | Yes |
Hb | Hemoglobin in g/dl | float | 12.3 |
HbA1c | HbA1c in % | float | 6.0 |
triglycerides_mg_dl | Triglycerides Cholesterol in mg/dl | int | 177 |
tot_cholesterol_mg_dl | Total Cholesterol in mg/dl | int | 201 |
HDL_mg_dl | HDL Cholesterol in mg/dl | int | 42 |
LDL_mg_dl | LDL Cholesterol in mg/dl | float | 117.0 |
VLDL_mg_dl | VLDL Cholesterol in mg/dl | float | 35.4 |
totchol_by_hdl_ratio | Total Cholesterol/HDL ratio | float | 4.8 |
ldl_by_hdl_ratio | LDL/HDL ratio | float | 2.77 |
creatinine_mg_dl | Serum Creatinine in mg/dl | float | 0.9 |
literate | Whether the respondent is literate?Yes ,No
| string | Yes |
smoking_1 | Do you currently smoke tobacco on a daily basis, less than daily, or not at all?Daily ,Less than daily ,Not at all ,Refused ,Don’t know Source: GATS | string | Not at all |
smoking_2 | Have you smoked tobacco daily in the past?Yes ,No ,Refused ,Don’t know Source: GATS | string | Yes |
smoking_3 | In the past, have you smoked tobacco on a daily basis, less than daily, or not at all?Daily ,Less than daily ,Not at all ,Refused ,Don’t know Source: GATS | string | Not at all |
The next four fields correspond to the responses of the 4-item patient health questionnaire on anxiety and depression. Source: PHQ-4 Over the last two weeks, how often have you been bothered by the following problems? Not at all ,Several days ,More than half the days ,Nearly everyday ,Refused ,Don't know
| |||
phq_1 | Feeling nervous, anxious or on edge | string | Not at all |
phq_2 | Not being able to stop or control worrying | string | Not at all |
phq_3 | Feeling down, depressed or hopeless | string | Not at all |
phq_4 | Little interest or pleasure in doing things | string | Not at all |
direct_lonely | Over the last two weeks, how often have you been feeling lonely?Not at all ,Several days ,More than half the days ,Nearly everyday ,Refused
| string | Several days |
- Rows: 4,448
- Columns: 62
Electrocardiograms
This dataset contains electrocardiograms from two different devices. The first device produces a 12 lead ECG with a sample rate of 1000 Hz. The second device produces a single lead ECG with a sample rate of 300 Hz.
12 Lead ECG Waveforms
ecg-12-lead-filepath.csv
Column Name | Description | Data Type | Example |
---|---|---|---|
patient_ngsci_id | Unique identifier for each respondent | string | 00ab… |
filepath | Path to file | string | path/to/file |
- Rows: 3,827
- Columns: 2
The ECG filepaths have the following structure:
ecg-12-lead/waveform/{first two digits of patient_ngsci_id}/{patient_ngsci_id}.npz
The waveform data for each ECG is stored in a separate compressed NumPy file. numpy.load doc. These files contain two arrays.
-
'waveform'
- Raw waveform. Shape: (12, 10000) -
'beat_waveform'
- Beat waveform derived from the raw waveform data. Shape: (12, various)
Single Lead ECG Waveforms
ecg-single-lead-filepath.csv
Column Name | Description | Data Type | Example |
---|---|---|---|
patient_ngsci_id | Unique identifier for each respondent | string | 00ab… |
filepath | Path to file | string | path/to/file |
- Rows: 4,224
- Columns: 2
The ECG filepaths have the following structure:
ecg-single-lead/{first two digits of patient_ngsci_id}/{patient_ngsci_id}.npz
The waveform data for each ECG is stored in a separate compressed NumPy file. numpy.load doc. These files contain two arrays.
-
'raw_waveform'
- Raw waveform. Shape: (9000,) -
'enhanced_waveform'
- Enhanced waveform derived from the raw waveform data. Shape: (9000,)
Echocardiograms
Echo Videos
echo-filepath.csv
Column Name | Description | Data Type | Example |
---|---|---|---|
patient_ngsci_id | Unique identifier for each respondent | string | 00ab… |
echo_view | TTE views of video | int | 1 |
video_format | The format of the video file | string | MPEG |
filepath | Path to file | string | path/to/file |
- Rows: 52,099
- Columns: 4
The echocardiogram videos are not uniformly formatted. There are three formats: AVI, MPEG, MP4. The filenames of the videos corresponds to one of 13 views, as seen below.
Transthoracic Echocardiography (TTE) Views
- PSLAX: 2D
- PSLAX: color (for assessing aortic and mitral regurgitation)
- PSAX at AOV level: 2D
- PSAX at AOV level: color
- PSAX at MV level (visualize MV orifice)
- PSAX at MV level: Color
- PSAX at PAP level
- Apical: Four-chamber: 2D (full visualization of atria and ventricles)
- Apical: Four-chamber: color mitral regurgitation
- Apical: Four-chamber: color tricuspid regurgitation
- Apical: Five-chamber: color aortic regurgitation
- Subcostal: 2D for interatrial septum
- IVC: volume and IVC collapse
Acronyms: AOV, Aortic valve; MV, mitral valve; PAP, papillary muscle; PSAX, parasternal short-axis view; PSLAX, parasternal long-axis view; 2D, two-dimensional
Echo Annotations
echo-annotation.csv
The echocardiograms were reviewed by a cardiologist. The table contains the measurements preformed during the echo and the evaluation by the cardiologist.
Column Name | Description | Data Type | Example |
---|---|---|---|
patient_ngsci_id | Unique identifier for each respondent | string | 00ab.. |
district_id | A number representing one of five districts | string | 01 |
camp_id | A number representing a camp within the district | string | 02 |
AO:cm | Aorta diameter [cm] | float | 3.1 |
LA:cm | Left atrium diameter [cm] | float | 3.9 |
PWD:cm | Left ventricular posterior wall thickness at end-diastole [cm] | float | 0.9 |
PWS:cm | Left ventricular posterior wall thickness at end-systole [cm] | float | 1.1 |
LVIDD:cm | Left ventricular internal dimension at end-diastole [cm] | float | 4.2 |
LVIDS:cm | Left ventricular internal dimension at end-systole [cm] | float | 2.6 |
IVSD:cm | Interventricular septum thickness at end-diastole [cm] | float | 0.9 |
IVSS:cm | Interventricular septum thickness at end-systole [cm] | float | 1.1 |
EF:% | Ejection Fracture [%] | int | 63 |
FS:% | Fractional shortening [%] | int | 33 |
SV:ml | Stroke volume [mL] | int | 98 |
mitral_valve | An evaluation of the mitral valve | string | NORMAL |
aortic_valve | An evaluation of the aortic valve | string | NORMAL |
tricuspid_valve | An evaluation of the tricuspid valve | string | NORMAL |
pulmonary_valve | An evaluation of the pulmonary valve | string | NORMAL |
right_ventricle | An evaluation of the right ventricle | string | NORMAL |
inter_ventricle_septum | An evaluation of the inter ventricle septum | string | NORMAL |
inter_atrial_septum | An evaluation of the inter atrial septum | string | NORMAL |
pulmonary_artery | An evaluation of the pulmonary artery | string | NORMAL |
right_atrium | An evaluation of the right atrium | string | NORMAL |
left_atrium | An evaluation of the left atrium | string | NORMAL |
left_ventricle | An evaluation of the left ventricle | string | NORMAL |
pericardium | An evaluation of the pericardium | string | NORMAL |
clot/vegetation | An assessment of the presence of a clot or vegetation | string | NORMAL |
impression | A general evaluation by the cardiologist | string | NO RWMA,NORMAL LV FUNCTION |
Fundus Images
fundus-filepath.csv
Column Name | Description | Data Type | Example |
---|---|---|---|
patient_ngsci_id | Unique identifier for each respondent | string | 00ab… |
image_number | The image number | int | 1 |
filepath | Path to file | string | path/to/file |
- Rows: 18,445
- Columns: 3
Most patients have four images. The technicians were instructed to take and save four images per person in the following order: Left disc view, Left macular view, Right disc view, Right macular view.
Chest X-rays
X-ray Images
xray-filepath.csv
Column Name | Description | Data Type | Example |
---|---|---|---|
patient_ngsci_id | Unique identifier for each respondent | string | 00ab… |
filepath | Path to file | string | path/to/file |
- Rows: 3,887
- Columns: 2
Patients will have at most one X-ray.
X-ray Annotation
xray-annotation.csv
Each patient’s X-ray has been review by a radiologist. The radiologist evaluated the X-ray in six categories and included an general impression of the X-ray.
Column Name | Description | Data Type | Example |
---|---|---|---|
patient_ngsci_id | Unique identifier for each respondent | string | 00ab.. |
district_id | A number representing one of five districts | string | 01 |
camp_id | A number representing a camp within the district | string | 01 |
trachea | An evaluation of the trachea (windpipe) | string | Trachea appears normal |
cardiothoracic | An evaluation of the Cardio-Thoracic Ratio, which is used to describe heart size | string | Cardiothoracic ratio is within normal limits |
bilateral_lung | An evaluation of the lungs | string | Bilateral lung fields appear normal |
costo | The costophrenic angle compares the angle between the diagram and the ribs. | string | Costo and cardiophrenic angles appear normal |
visualised_bony | An evaluation of the bone structures | string | Visualised bony structures appear normal |
extra_thoracic | An evaluation of soft tissue outside the chest cavity, e.g. breast tissue | string | Extra thoracic soft tissues shadow grossly appears normal |
xray_impression | A general evaluation of the X-ray from the radiologist | string | Chest X-ray shows no significant abnormality |