Search CORE

4 research outputs found

The A, C, G, and T of Genome Assembly

Author: Ekti Ali
Serpedin Erchin
Sohail Muhammad
Wajid Bilal
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Genome assembly in its two decades of history has produced significant research, in terms of both biotechnology and computational biology. This contribution delineates sequencing platforms and their characteristics, examines key steps involved in filtering and processing raw data, explains assembly frameworks, and discusses quality statistics for the assessment of the assembled sequence. Furthermore, the paper explores recent Ubuntu-based software environments oriented towards genome assembly as well as some avenues for future research

Directory of Open Access Journals

Texas A&M Repository

PubMed Central

The A, C, G, and T of Genome Assembly

Author: Ali R. Ekti
Bilal Wajid
Erchin Serpedin
Muhammad U. Sohail
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

Texas A&M Repository

Information Theory, Graph Theory and Bayesian Statistics based improved and robust methods in Genome Assembly

Author: Wajid Bilal
Publication venue
Publication date: 29/10/2015
Field of study

Bioinformatics skills required for genome sequencing often represent a significant hurdle for many researchers working in computational biology. This dissertation highlights the significance of genome assembly as a research area, focuses on its need to remain accurate, provides details about the characteristics of the raw data, examines some key metrics, emphasizes some tools and outlines the whole pipeline for next-generation sequencing. Currently, a major effort is being put towards the assembly of the genomes of all living organisms. Given the importance of comparative genome assembly, herein dissertation, the principle of Minimum Description Length (MDL) and its two variants, the Two-Part MDL and Sophisticated MDL, are explored in identifying the optimal reference sequence for genome assembly. Thereafter, a Modular Approach to Reference Assisted Genome Assembly Pipeline, referred to as MARAGAP, is developed. MARAGAP uses the principle of Minimum Description Length (MDL) to determine the optimal reference sequence for the assembly. The optimal reference sequence is used as a template to infer inversions, insertions, deletions and Single Nucleotide Polymorphisms (SNPs) in the target genome. MARAGAP uses an algorithmic approach to detect and correct inversions and deletions, a De-Bruijn graph based approach to infer insertions, an affine-match affine-gap local alignment tool to estimate the locations of insertions and a Bayesian estimation framework for detecting SNPs (called BECA). BECA effectively capitalizes on the `alignment-layout-consensus' paradigm and Quality (Q-) values for detecting and correcting SNPs by evaluating a number of probabilistic measures. However, the entire process is conducted once. BECA's framework is further extended by using Gibbs Sampling for further iterations of BECA. After each assembly the reference sequence is updated and the probabilistic score for each base call renewed. The revised reference sequence and probabilities are then further used to identify the alignments and consensus sequence, thereby, yielding an algorithm referred to as Gibbs-BECA. Gibbs-BECA further improves the performance both in terms of rectifying more SNPs and offering a robust performance even in the presence of a poor reference sequence. Lastly, another major effort in this dissertation was the development of two cohesive software platforms that combine many different genome assembly pipelines in two distinct environments, referred to as Baari and Genobuntu, respectively. Baari and Genobuntu support pre-assembly tools, genome assemblers as well as post-assembly tools. Additionally, a library of tools developed by the authors for Next Generation Sequencing (NGS) data and commonly used biological software have also been provided in these software platforms. Baari and Genobuntu are free, easily distributable and facilitate building laboratories and software workstations both for personal use as well as for a college/university laboratory. Baari is a customized Ubuntu OS packed with the tools mentioned beforehand whereas Genobuntu is a software package containing the same tools for users who already have Ubuntu OS pre-installed on their systems

Texas A&M Repository

Supersonic MiB

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref