GAIN comprises of a method for imputing missing data by adapting the Generative Adversarial Nets (GAN) framework. GAN models consists of generator and discriminator. The generator observes some components of a real data vector, imputes the missing components conditioned on what is actually observed, and outputs a completed vector. The discriminator then takes a completed vector and attempts to determine which components were actually observed and which were imputed. To ensure that discriminator forces generator to learn the desired distribution, discriminator is provided with some additional information in the form of a hint vector. The hint vector reveals to discriminator partial information about the missingness of the original sample, which is used by discriminator to focus its attention on the imputation quality of particular components. This hint vector ensures that generator does in fact learn to generate according to the true data distribution. GAIN is observed to outperform existing imputation methods.
Input variables : Real data with missing values
Output Variables : Synthetically generated data with imputed missing values
Visit Model : github.com
Additional links : medianetlab.ee.ucla.edu
Model Category | : | Public |
Date Published | : | January, 2018 |
Healthcare Domain | : |
Payer
Provider |
Code | : | github.com |
Data Privacy |
Synthetic Data Generation |