Researchers have a controlled access to the Electronic Health Records (EHR) data as it is composed of personal identifiers and sensitive medical information. This model utilizes a new approach, medical Generative Adversarial Network (medGAN), to generate realistic synthetic patient records. Based on input real patient records, medGAN can generate high-dimensional discrete variables (e.g., binary and count features) via a combination of an autoencoder and generative adversarial networks. medGAN generates synthetic patient records that achieve comparable performance to real data on many aspects including distribution statistics, predictive modeling tasks and a medical expert review. A minibatch averaging to efficiently avoid mode collapse, and increase the learning efficiency with batch normalization and shortcut connections is also a part of model algorithm. It had been empirically observed that there is a limited privacy risk in both identity and attribute disclosure using medGAN.
Input variables : EHR Data
Output Variables : Synthetic Binary and count variables from EHRs (i.e. medical codes such as diagnosis codes, medication codes or procedure codes)
Visit Model : github.com
Additional links : arxiv.org
Model Category | : | Public |
Date Published | : | January, 2018 |
Healthcare Domain | : |
Payer
Provider |
Code | : | github.com |
Data Privacy |
Synthetic Data Generation |