When collecting sensitive information from groups or organization, it is highly important to maintain an individual's privacy. A formalization of privacy, called as differential privacy has become the gold standard used to protect information from malicious agents. QUAIL is an ensemble-based model that generates synthetic data. QUAIL's simple modification to a differentially private data synthesis architecture boosts synthetic data utility in machine learning scenarios without harming summary statistics or privacy guarantees. To assess the efficacy of differentially private synthesis data, four differentially private generative adversarial networks for data synthesis are surveyed. Each of them are evaluated at scale on five tabular datasets, and in two applied industry scenarios. Results suggested that some synthesizers are more applicable for different privacy budgets, and by complicating domain-based tradeoff for selecting an approach, demonstrations are made.
Input variables : Evaluation Dataset
Output Variables : Synthetic Dataset
Statistical | : | Somers D | Accuracy | Precision and Recall | Confusion Matrix | F1 Score | Roc and Auc | Prevalence | Detection Rate | Balanced Accuracy | Cohen's Kappa | Concordance | Gini Coefficent | KS Statistic | Youden's J Index |
Infrastructure | : | Log Bytes | Logging/User/IAMPolicy | Logging/User/VPN | CPU Utilization | Memory Usage | Error Count | Prediction Count | Prediction Latencies | Private Endpoint Prediction Latencies | Private Endpoint Response Count |
Visit Model : github.com
Additional links : arxiv.org
Model Category | : | Public |
Date Published | : | March, 2020 |
Healthcare Domain | : |
Payer
Provider |
Code | : | github.com |
Data Privacy |
Synthetic Data Generation |