Search CORE

4,508 research outputs found

The Mathematics of Phylogenomics

Author: Pachter Lior
Sturmfels Bernd
Publication venue
Publication date: 01/01/2004
Field of study

The grand challenges in biology today are being shaped by powerful high-throughput technologies that have revealed the genomes of many organisms, global expression patterns of genes and detailed information about variation within populations. We are therefore able to ask, for the first time, fundamental questions about the evolution of genomes, the structure of genes and their regulation, and the connections between genotypes and phenotypes of individuals. The answers to these questions are all predicated on progress in a variety of computational, statistical, and mathematical fields. The rapid growth in the characterization of genomes has led to the advancement of a new discipline called Phylogenomics. This discipline results from the combination of two major fields in the life sciences: Genomics, i.e., the study of the function and structure of genes and genomes; and Molecular Phylogenetics, i.e., the study of the hierarchical evolutionary relationships among organisms and their genomes. The objective of this article is to offer mathematicians a first introduction to this emerging field, and to discuss specific mathematical problems and developments arising from phylogenomics.Comment: 41 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX

Caltech Authors

The EM Algorithm and the Rise of Computational Biology

Author: Citable Link
Jun S. Liu
Xiaodan Fan
Yuan Yuan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

A biophysical approach to large-scale protein-DNA binding data

Author: Manke T.
Roider H.
Vingron M.
Publication venue
Publication date: 01/01/2008
Field of study

About this book * Cutting-edge genome analysis methods from leading bioinformaticians An accurate description of current scientific developments in the field of bioinformatics and computational implementation is presented by research of the BioSapiens Network of Excellence. Bioinformatics is essential for annotating the structure and function of genes, proteins and the analysis of complete genomes and to molecular biology and biochemistry. Included is an overview of bioinformatics, the full spectrum of genome annotation approaches including; genome analysis and gene prediction, gene regulation analysis and expression, genome variation and QTL analysis, large scale protein annotation of function and structure, annotation and prediction of protein interactions, and the organization and annotation of molecular networks and biochemical pathways. Also covered is a technical framework to organize and represent genome data using the DAS technology and work in the annotation of two large genomic sets: HIV/HCV viral genomes and splicing alternatives potentially encoded in 1% of the human genome

MPG.PuRe

Statistical analysis on detecting recombination sites in DNA-beta satellites associated with the old world geminiviruses

Author: Xu Kai
Yoshida Ruriko
Publication venue
Publication date: 01/01/2010
Field of study

Although an exchange of genetic information by recombination plays an important role in the evolution of viruses, it is not clear how it generates diversity. {\it Geminiviruses} are plant viruses which have ambisense single-stranded circular DNA genomes and one of the most economically important plant viruses in agricultural production. Small circular single-stranded DNA satellites, termed DNA-

\beta

, have recently been found associated with some geminivirus infections. In this paper we analyze a satellite molecule DNA-

\beta

of geminiviruses for recombination events using phylogenetic and statistical analysis and we find that one strain from ToLCMaB has a recombination pattern and is possibly recombinant molecule between two strains from two species, PaLCuB-[IN:Chi:05] (major parent) and ToLCB-[IN:CP:04] (minor parent).Comment: 8 figures and 2 tables. To appear in Frontiers in Systems Biolog

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

Recommended from our members

EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences.

Author: Ge Xinzhou
Kwon Soo Bin
Li Jingyi Jessica
Li Wei Vivian
Xie Lingjue
Zhang Haowen
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns

eScholarship - University of California

Rapid Bursts of \u3ci\u3eAndrogen-Binding Protein (Abp)\u3c/i\u3e Gene Duplication Occurred Independently in Diverse Mammals

Author: Blakely Tyler D.
Heger Andreas
Karn Robert C.
Laukaitis Christina M.
Munclinger Pavel
Ponting Chris P.
Publication venue: Digital Commons @ Butler University
Publication date: 01/01/2008
Field of study

Background The draft mouse (Mus musculus) genome sequence revealed an unexpected proliferation of gene duplicates encoding a family of secretoglobin proteins including the androgen-binding protein (ABP) α, β and γ subunits. Further investigation of 14 α-like (Abpa) and 13 β- or γ-like (Abpbg) undisrupted gene sequences revealed a rich diversity of developmental stage-, sex- and tissue-specific expression. Despite these studies, our understanding of the evolution of this gene family remains incomplete. Questions arise from imperfections in the initial mouse genome assembly and a dearth of information about the gene family structure in other rodents and mammals. Results Here, we interrogate the latest \u27finished\u27 mouse (Mus musculus) genome sequence assembly to show that the Abp gene repertoire is, in fact, twice as large as reported previously, with 30 Abpa and 34 Abpbg genes and pseudogenes. All of these have arisen since the last common ancestor with rat (Rattus norvegicus). We then demonstrate, by sequencing homologs from species within the Mus genus, that this burst of gene duplication occurred very recently, within the past seven million years. Finally, we survey Abp orthologs in genomes from across the mammalian clade and show that bursts of Abp gene duplications are not specific to the murid rodents; they also occurred recently in the lagomorph (rabbit, Oryctolagus cuniculus) and ruminant (cattle, Bos taurus) lineages, although not in other mammalian taxa. Conclusion We conclude that Abp genes have undergone repeated bursts of gene duplication and adaptive sequence diversification driven by these genes\u27 participation in chemosensation and/or sexual identification

Digital Commons @ Butler University

Functional Analysis of Intergenic Regions for Gene Discovery

Author: Fu Li M.
Publication venue: 'IntechOpen'
Publication date: 02/09/2011
Field of study

IntechOpen

Crossref

Whole Genome Annotation: In Silico Analysis

Author: Adriana Carneiro
Amjad Ali
Anderson Miyoshi
Anderson Santos
Anne Pinto
Artur Silva
Aryane Magalhães
Eudes Barbosa
Louise Cerdeira
Paula Schneider
Rommel Ramos
Sintia Almeida
Siomar Soares
Vasco Azevedo
Vinicius Abreu
Publication venue: 'IntechOpen'
Publication date: 02/11/2011
Field of study

IntechOpen

Evolution of the Set of Signal Transduction Proteins in 10 Species of \u3cem\u3eShewanella\u3c/em\u3e

Author: Shanafield Harold Arthur
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2008
Field of study

The recent completion of the sequencing of several species of the Shewanella genus provides a unique opportunity for comparative genomics studies. We chose the first 10 fully sequenced Shewanella genomes to investigate the evolution of signal transduction proteins (ST). ST is a universal and highly regulated system, and as a very well-studied system provides an excellent starting point for investigation. Furthermore, Shewanella have been shown to have a large number of two-component systems and diguanylate cyclases relative to their genome size. In this study we investigate the evolution of signal transduction across several Shewanella strains by utilizing a domainlevel approach for determining homology and orthology of the parent proteins. Proteins were broken down into their constituent domains and domain sized sequences and compared using a reciprocal best BLAST hit approach to determine homology between all of the species. Analysis of homologous domains and proteins revealed several levels of conservation and a core group of signal transduction proteins common to all members. Further analysis of domain homology provided putative annotations of previously unrecognized sequences and highlighted deficiencies in specific Pfam domain models. Analysis of paralogous domains and proteins showed agreement with 16s rRNA based estimates of evolution, although the position of S. oneidensis MR-1 was novel

University of Tennessee, Knoxville: Trace

Killing Two Birds with One Stone: The Concurrent Development of the Novel Alignment Free Tree Building Method, Scrawkov-Phy, and the Extensible Phyloinformatics Utility, EMU-Phy.

Author: Fisk J. Nick
Publication venue: RIT Scholar Works
Publication date: 27/03/2016
Field of study

Many components of phylogenetic inference belong to the most computationally challenging and complex domain of problems. To further escalate the challenge, the genomics revolution has exponentially increased the amount of data available for analysis. This, combined with the foundational nature of phylogenetic analysis, has prompted the development of novel methods for managing and analyzing phylogenomic data, as well as improving or intelligently utilizing current ones. In this study, a novel alignment tree building algorithm using Quasi-Hidden Markov Models (QHMMs), Scrawkov-Phy, is introduced. Additionally, exploratory work in the design and implementation of an extensible phyloinformatics tool, EMU-Phy, is described. Lastly, features of the best-practice tools are inspected and provisionally incorporated into Scrawkov-Phy to evaluate the algorithm’s suitability for said features. This study shows that Scrawkov-Phy, as utilized through EMU-Phy, captures phylogenetic signal and reconstructs reasonable phylogenies without the need for multiple-sequence alignment or high-order statistical models. There are numerous additions to both Scrawkov-Phy and EMU-Phy which would improve their efficacy and the results of the provisional study shows that such additions are compatible

RIT Scholar Works