Fraud leads to substantial costs and losses for companies and clients in the insurance industries hence fraud detection is a key function in these industries. In this approach an architecture is proposed for text embeddings via deep learning, which help to improve fraud detection for insurance claims. Most of the transaction and claims data are in unstructered format. Hence there are challenges for analysis and it influences the selection of classification approach. The model outperforms state of art methods and can make claims management process more efficient. Embedding is used to achieve features from doctors bills. Applying gradient boosting and simple word embedding based models classification task is performed. Claims data is highly imbalanced when dealing with fraud detection. The models works good in such cases as well.
Input variables : General static features like age, gender, insurance type, doctor's speciality and visit specific features like treatment type, number of each treatment, cost of each treatment, billing category, performance type
Output Variables : Claim status (fraudulent/ justified)
Statistical | : | Somers D | Accuracy | Precision and Recall | Confusion Matrix | F1 Score | Roc and Auc | Prevalence | Detection Rate | Balanced Accuracy | Cohen's Kappa | Concordance | Gini Coefficent | KS Statistic | Youden's J Index |
Business | : | Claims Processed | $ Saved | FWA Rate | FWA by CPT | FWA by Provider |
Infrastructure | : | Log Bytes | Logging/User/IAMPolicy | Logging/User/VPN | CPU Utilization | Memory Usage | Error Count | Prediction Count | Prediction Latencies | Private Endpoint Prediction Latencies | Private Endpoint Response Count |
Visit Model : github.com
Additional links : arxiv.org
Model Category | : | Public |
Date Published | : | May, 2019 |
Healthcare Domain | : | Payer |
Code | : | github.com |
Claims Management |
Fraud Waste and Abuse |