Advanced machine learning models applied to large-scale genomics datasets hold the promise to be major drivers for genome science. Once trained, such models can serve as a tool to probe the relationships between data modalities, including the effect of genetic variants on phenotype. ClinVar - variants in the range [-40nt, 10nt] around the splicing acceptor or variants in the range [-10, 10] nt around the splice donor of a protein coding gene were used to train. Only variants labelled 'Pathogenic' or 'Benign' were used. Data from all chromosomes was used for training. Logistic regression implemented in scikit-learn with default parameters was used to build the meta model using different feature subsets. 10-fold cross-validation was used to evaluate models using the auROC metric.
Input variables : Combined Annotation-Dependent Depletion score, CADD phred-like rank score based on whole genome CADD raw scores, phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 33 placental mammal genomes including human, conservation score based on the multiple alignments of 10 primate genomes including human etc.
Output Variables : Clinical significance: Benign/pathogenic
Statistical | : | Somers D | Accuracy | Precision and Recall | Confusion Matrix | F1 Score | Roc and Auc | Prevalence | Detection Rate | Balanced Accuracy | Cohen's Kappa | Concordance | Gini Coefficent | KS Statistic | Youden's J Index |
Business | : | Population at High Risk of Disease | Risk by Geography | Risk by Demographics | Risk by Clinical Parameters | Optimized Hospital Resource Utilization | Decreased Cost of Care | Decreased Patient Visits |
Infrastructure | : | Log Bytes | Logging/User/IAMPolicy | Logging/User/VPN | CPU Utilization | Memory Usage | Error Count | Prediction Count | Prediction Latencies | Private Endpoint Prediction Latencies | Private Endpoint Response Count |
Visit Model : kipoi.org
Additional links : biorxiv.org | github.com
Model Category | : | Public |
Date Published | : | July, 2018 |
Healthcare Domain | : |
Life Sciences
Provider |
Code | : | kipoi.org |
Health Risk Management |
Health Risk Prediction |