Breast cancer risk prediction modeling allows researchers to identify high-risk patients and reduce unnecessary interventions. Currently, breast cancer risk prediction models tend to exhibit low discriminatory accuracy (0.53-0.64). Here the machine learning (ML) approaches have been employed to address current limitations and improve accuracy of outcome forecasts. This study compares the discriminatory accuracy of ML-based estimates against a pair of established methods: Breast Cancer Risk Assessment Tool (BCRAT) and Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA). The significant prediction improvement of the accuracy of classification of women with and without breast cancer achieved with ML algorithms is important in personalized medicine and may suggest prevention strategies and individualized clinical management. Eight simulated datasets and two retrospective samples were used to compare the performance of the proposed ML methods against state-of-the-art BCRAT and BOADICEA models. Predictive accuracy (AU-ROC curve) reached 88.28% using ML-Adaptive Boosting and 88.89% using ML-random forest versus 62.40% with BCRAT for the U.S. population-based sample. Predictive accuracy reached 90.17% using ML-adaptive boosting and 89.32% using ML-Markov chain Monte Carlo generalized linear mixed model versus 59.31% with BOADICEA for the Swiss clinic-based sample.
Input variables : Demographic variables, Family history of breast cancer, medical history related to cancer, menstrual health
Output Variables : Probability of developing breast cancer [0,1]
Statistical | : | Somers D | Accuracy | Precision and Recall | Confusion Matrix | F1 Score | Roc and Auc | Prevalence | Detection Rate | Balanced Accuracy | Cohen's Kappa | Concordance | Gini Coefficent | KS Statistic | Youden's J Index |
Business | : | Population at High Risk of Disease | Population at High Risk of Disease | Risk by Geography | Risk by Geography | Risk by Demographics | Risk by Demographics | Risk by Clinical Parameters | Risk by Clinical Parameters | Optimized Hospital Resource Utilization | Decreased Cost of Care | Decreased Patient Visits |
Infrastructure | : | Log Bytes | Logging/User/IAMPolicy | Logging/User/VPN | CPU Utilization | Memory Usage | Error Count | Prediction Count | Prediction Latencies | Private Endpoint Prediction Latencies | Private Endpoint Response Count |
Visit Model : ccge.medschl.cam.ac.uk
Additional links : bcrisktool.cancer.gov | ccge.medschl.cam.ac.uk
Model Category | : | Public |
Date Published | : | April, 2020 |
Healthcare Domain | : | Provider |
Code | : | github.com |
Health Risk Management |
Health Risk Prediction |
Risk Progression |