Synthetic Data Bridge

Modern AI/ML, empowered by the digitization of medicine and computing breakthroughs, has the potential to significantly boost health equity through appropriate design and use. However, EHR data is difficult to get access to, due to privacy and other legal concerns. Even when EHR data is available AI/ML methods often need very large data sets to train on. Synthetic versions that mimic the statistical distributions of the original data can help provide a larger set and/or be used to balance sets that have some common biases like insufficient numbers of females or members of a particular racial/ethnic group in the training data.

The Synthetic Data Bridge is providing synthetic versions of selected MedStar Health’s curated data sets in the AIM-AHEAD Data Bridge. Synthetic data is available to AIM-AHEAD consortium members who agree to follow MedStar’s Terms of Use. Synthetic data sets are produced by ICBI using Gretel.ai software and synthetic data models. A data dictionary is provided.

Available Now:

1) A synthetic version of MedStar’s Cardiometabolic Correlates and Maternal Health .

This synthetic version is derived from the data set available in the Data Bridge and has the same number or Patients. It is designed to mimic the original data set. It is available for research and training purposes only. To obtain the data please click the link to the left and fill out the request form and agree to the terms of use.

2) A synthetic version of MedStar’s Opioid-Use-and-Misuse

This synthetic version is derived from the data set available in the Data Bridge but has only the years 2017 - 2019 and the 205,401 patients who had data for all 3 years. 

More coming:

Request access to Data   (It may take 1 business day for us to respond)