GapPredict- A Language Model for Resolving Gaps in Draft Genome Assemblies

Eric Chen | Inanc Birol | Jessica Zhang | Justin Chu | René L. Warren

BC Cancer Vancouver | Genome Sciences Centre

GapPredict is a tool that uses a character-level language model to predict unresolved nucleotides in gaps when de novo assembly algorithms cannot reconstruct contiguous genome sequences fully using short reads provided by Short-read DNA sequencing instruments. Genome reads and their reverse-complements are used to train a language model. The model consists of LTSM architecture, Adam optimizer and loss function as categorical cross-entropy. They have defined a metric 'sequence percent correctness' to validate the GapPredict output. Filling these gaps improves the quality of draft genomes, which has implications while conducting analysis such as finding structural variation identification and gene annotation.

Input variables : Genome assembly with gaps
Output Variables : Genome assembly without gaps

Metrics to Monitor

Statistical	:	Somers D \| Accuracy \| Precision and Recall \| Confusion Matrix \| F1 Score \| Roc and Auc \| Prevalence \| Detection Rate \| Balanced Accuracy \| Cohen's Kappa \| Concordance \| Gini Coefficent \| KS Statistic \| Youden's J Index
Infrastructure	:	Log Bytes \| Logging/User/IAMPolicy \| Logging/User/VPN \| CPU Utilization \| Memory Usage \| Error Count \| Prediction Count \| Prediction Latencies \| Private Endpoint Prediction Latencies \| Private Endpoint Response Count

Visit Model : github.com

Additional links : arxiv.org

Model Category	:	Public
Date Published	:	May, 2021
Healthcare Domain	:	Life Sciences
Code	:	github.com

GapPredict- A Language Model for Resolving Gaps in Draft Genome Assemblies

Model Details

Applications

Solutions

You can also search for

GapPredict- A Language Model for Resolving Gaps in Draft Genome Assemblies

Model Details

Applications

Solutions

You can also search for

Share