Synthetic data presents a promising solution to the privacy concern, if synthetic data has comparable utility to real data and if it preserves the privacy of patients. However, the generation of synthetic text alone is not useful for NLP because of the lack of annotations.In this case, a neural language models (LSTM and GPT-2) is used for generating artificial EHR text jointly with annotations for named-entity recognition.Thus a neural language models can be used successfully to generate artificial text with in-line annotations. Despite varying syntactic and stylistic properties, as well as topical incoherence, they are of sufficient utility to be used for training downstream machine learning models.The Synthetic Data can be utilized as a replacement for real data, when real data are unavailable or cannot be shared, and as a special form of data augmentation to generate additional training examples for training ML models.
Input variables : Raw Textual EHR Data
Output Variables : Synthetic and Annotated Text
Statistical | : | Somers D | Accuracy | Precision and Recall | Confusion Matrix | F1 Score | Roc and Auc | Prevalence | Detection Rate | Balanced Accuracy | Cohen's Kappa | Concordance | Gini Coefficent | KS Statistic | Youden's J Index |
Infrastructure | : | Log Bytes | Logging/User/IAMPolicy | Logging/User/VPN | CPU Utilization | Memory Usage | Error Count | Prediction Count | Prediction Latencies | Private Endpoint Prediction Latencies | Private Endpoint Response Count |
Visit Model : github.com
Additional links : mdpi.com
Model Category | : | Public |
Date Published | : | May, 2021 |
Healthcare Domain | : |
Payer
Provider |
Code | : | github.com |
Data Privacy |
Synthetic Data Generation |