GapPredict is a tool that uses a character-level language model to predict unresolved nucleotides in gaps when de novo assembly algorithms cannot reconstruct contiguous genome sequences fully using short reads provided by Short-read DNA sequencing instruments. Genome reads and their reverse-complements are used to train a language model. The model consists of LTSM architecture, Adam optimizer and loss function as categorical cross-entropy. They have defined a metric 'sequence percent correctness' to validate the GapPredict output. Filling these gaps improves the quality of draft genomes, which has implications while conducting analysis such as finding structural variation identification and gene annotation.
Input variables : Genome assembly with gaps
Output Variables : Genome assembly without gaps
Statistical | : | Somers D | Accuracy | Precision and Recall | Confusion Matrix | F1 Score | Roc and Auc | Prevalence | Detection Rate | Balanced Accuracy | Cohen's Kappa | Concordance | Gini Coefficent | KS Statistic | Youden's J Index |
Infrastructure | : | Log Bytes | Logging/User/IAMPolicy | Logging/User/VPN | CPU Utilization | Memory Usage | Error Count | Prediction Count | Prediction Latencies | Private Endpoint Prediction Latencies | Private Endpoint Response Count |
Visit Model : github.com
Additional links : arxiv.org
Model Category | : | Public |
Date Published | : | May, 2021 |
Healthcare Domain | : | Life Sciences |
Code | : | github.com |
Data Privacy |
Synthetic Data Generation |