The complex language of eukaryotic gene expression remains incompletely understood. Thus, most of the many noncoding variants statistically associated with human disease have unknown mechanism. This challenge is addressed here. Basset is an open source package to apply deep CNNs to learn the functional activity of DNA sequences from genomics data. CNNs simultaneously learn the relevant sequence motifs and the regulatory logic with which they are combined to determine cell-specific DNA accessibility. Basset is trained on a compendium of accessible genomic sites mapped in 164 cell types by DNaseI-seq. With Basset, a researcher can perform a single sequencing assay in their cell type of interest and simultaneously learn that cell’s chromatin accessibility code and annotate every mutation in the genome with its influence on present accessibility and latent potential for accessibility. Thus, Basset offers a powerful computational approach to annotate and interpret the noncoding genome.
Input variables : DNA sequences
Output Variables : Genome accessibility
Statistical | : | Somers D | Accuracy | Precision and Recall | Confusion Matrix | F1 Score | Roc and Auc | Prevalence | Detection Rate | Balanced Accuracy | Cohen's Kappa | Concordance | Gini Coefficent | KS Statistic | Youden's J Index |
Business | : | Population at High Risk of Disease | Risk by Geography | Risk by Demographics | Risk by Clinical Parameters | Optimized Hospital Resource Utilization | Decreased Cost of Care | Decreased Patient Visits |
Infrastructure | : | Log Bytes | Logging/User/IAMPolicy | Logging/User/VPN | CPU Utilization | Memory Usage | Error Count | Prediction Count | Prediction Latencies | Private Endpoint Prediction Latencies | Private Endpoint Response Count |
Visit Model : github.com
Additional links : biorxiv.org
Model Category | : | Public |
Date Published | : | September, 2020 |
Healthcare Domain | : | Provider |
Code | : | github.com |
Health Risk Management |
Health Risk Prediction |