Search CORE

280 research outputs found

Bounded prefix-suffix duplication

Author: A. Ehrenfeucht
D. Gusfield
D. Knuth
D.B. Searls
D.P. Bovet
J. Dassow
J. Kärkkäinen
M. Crochemore
M. Frazier
M.-W. Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

We consider a restricted variant of the prefix-suffix duplication operation, called bounded prefix-suffix duplication. It consists in the iterative duplication of a prefix or suffix, whose length is bounded by a constant, of a given word. We give a sufficient condition for the closure under bounded prefix-suffix duplication of a class of languages. Consequently, the class of regular languages is closed under bounded prefix-suffix duplication; furthermore, we propose an algorithm deciding whether a regular language is a finite k-prefix-suffix duplication language. An efficient algorithm solving the membership problem for the k-prefix-suffix duplication of a language is also presented. Finally, we define the k-prefix-suffix duplication distance between two words, extend it to languages and show how it can be computed for regular languages

Crossref

Archivo Digital UPM (Univ. Politécnica de Madrid)

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Recommended from our members

The computational linguistics of biological sequences

Author: Searls D.
Publication venue: 'Stanford University Press'
Publication date: 31/12/1995
Field of study

This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Protein sequences are analogous in many respects, particularly their folding behavior. Proteins have a much richer variety of interactions, but in theory the same linguistic principles could come to bear in describing dependencies between distant residues that arise by virtue of three-dimensional structure. This tutorial will concentrate on nucleic acid sequences

UNT (University of North Texas) Digital Library

Evaluation of the Kinetic Properties of the Sporulation Protein SpoIIE of Bacillus subtilis by Inclusion in a Model Membrane

Author: Michael D. Yudkin
Stephanie Allen
Tim Searls
Xingyong Chen
Publication venue: American Society for Microbiology
Publication date: 01/05/2004
Field of study

Starvation induces Bacillus subtilis to initiate a developmental process (sporulation) that includes asymmetric cell division to form the prespore and the mother cell. The integral membrane protein SpoIIE is essential for the prespore-specific activation of the transcription factor σ(F), and it also has a morphogenic activity required for asymmetric division. An increase in the local concentration of SpoIIE at the polar septum of B. subtilis precedes dephosphorylation of the anti-anti-sigma factor SpoIIAA in the prespore. After closure and invagination of the asymmetric septum, phosphatase activity of SpoIIE increases severalfold, but the reason for this dramatic change in activity has not been determined. The central domain of SpoIIE has been seen to self-associate (I. Lucet et al., EMBO J. 19:1467-1475, 2000), suggesting that activation of the C-terminal PP2C-like phosphatase domain might be due to conformational changes brought about by the increased local concentration of SpoIIE in the sporulating septum. Here we report the inclusion of purified SpoIIE protein into a model membrane as a method for studying the effect of local concentration in a lipid bilayer on activity. In vitro assays indicate that the membrane-bound enzyme maintains dephosphorylation rates similar to the highly active micellar state at all molar ratios of protein to lipid. Atomic force microscopy images indicate that increased local concentration does not lead to self-association

Crossref

PubMed Central

Smarter Vaccine Design Will Circumvent Regulatory T Cell-Mediated Evasion in Chronic HIV and HCV Infection

Author: Andres H. Gutierrez
Anne Searls De Groot
Anne Searls De Groot
Chris eBailey-Kellogg
Frances eTerry
Leonard eMoise
Leonard eMoise
Phyllis eLosikoff
Ryan eTassone
Stephen H. Gregory
William D Martin
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/2014
Field of study

Despite years of research, vaccines against HIV and HCV are not yet available, due largely to effective viral immunoevasive mechanisms. A novel escape mechanism observed in viruses that cause chronic infection is suppression of viral-specific effector CD4(+) and CD8(+) T cells by stimulating regulatory T cells (Tregs) educated on host sequences during tolerance induction. Viral class II MHC epitopes that share a T cell receptor (TCR)-face with host epitopes may activate Tregs capable of suppressing protective responses. We designed an immunoinformatic algorithm, JanusMatrix, to identify such epitopes and discovered that among human-host viruses, chronic viruses appear more human-like than viruses that cause acute infection. Furthermore, an HCV epitope that activates Tregs in chronically infected patients, but not clearers, shares a TCR-face with numerous human sequences. To boost weak CD4(+) T cell responses associated with persistent infection, vaccines for HIV and HCV must circumvent potential Treg activation that can handicap efficacy. Epitope-driven approaches to vaccine design that involve careful consideration of the T cell subsets primed during immunization will advance HIV and HCV vaccine development

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

DigitalCommons@URI

Category Theoretic Analysis of Hierarchical Protein Materials and Social Networks

Author: A Fritsch
AL Barabasi
AL Barabasi
B Alberts
BC Pierce
CM Schneider
D Eisenberg
D Taylor
DA Fletcher
David I. Spivak
DB Searls
DI Spivak
E Moggi
E Rodriguez
Elizabeth Wood
EM Marcotte
EM Marcotte
FW Lawvere
GB Olson
H Jeong
H Jeong
H Peterlik
I Lee
J Aizenberg
J Verdasca
JD Currey
K Hofstetter
Laurent Kreplak
M Barr
M Moortgat
Markus J. Buehler
MD Hauser
MJ Buehler
MJ Buehler
MS Szalay
N Chomsky
N Huebsch
NM Pugno
O Mason
P Csermely
P Fratzl
P Nurse
P Wadler
R Brown
R Lakes
R Milo
R Paparcone
R Pastor-Satorras
RC Strohman
RT Oehrle
S Awodey
S Eilenberg
S Keten
SM Lane
SW Cranford
T Ackbarow
Tristan Giesa
WW Powell
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Materials in biology span all the scales from Angstroms to meters and typically consist of complex hierarchical assemblies of simple building blocks. Here we describe an application of category theory to describe structural and resulting functional properties of biological protein materials by developing so-called ologs. An olog is like a “concept web” or “semantic network” except that it follows a rigorous mathematical formulation based on category theory. This key difference ensures that an olog is unambiguous, highly adaptable to evolution and change, and suitable for sharing concepts with other olog. We consider simple cases of beta-helical and amyloid-like protein filaments subjected to axial extension and develop an olog representation of their structural and resulting mechanical properties. We also construct a representation of a social network in which people send text-messages to their nearest neighbors and act as a team to perform a task. We show that the olog for the protein and the olog for the social network feature identical category-theoretic representations, and we proceed to precisely explicate the analogy or isomorphism between them. The examples presented here demonstrate that the intrinsic nature of a complex system, which in particular includes a precise relationship between structure and function at different hierarchical levels, can be effectively represented by an olog. This, in turn, allows for comparative studies between disparate materials or fields of application, and results in novel approaches to derive functionality in the design of de novo hierarchical systems. We discuss opportunities and challenges associated with the description of complex biological materials by using ologs as a powerful tool for analysis and design in the context of materiomics, and we present the potential impact of this approach for engineering, life sciences, and medicine.Presidential Early Career Award for Scientists and Engineers (N000141010562)United States. Army Research Office. Multidisciplinary University Research Initiative (W911NF0910541)United States. Office of Naval Research (grant N000141010841)Massachusetts Institute of Technology. Dept. of MathematicsStudienstiftung des deutschen VolkesClark BarwickJacob Luri

arXiv.org e-Print Archive

Public Library of Science (PLOS)

CiteSeerX

DSpace@MIT

Crossref

Directory of Open Access Journals

PubMed Central

Publikationsserver der RWTH Aachen University

The Francis Crick Institute

Are grammatical representations useful for learning from biological sequence data?— a case study

Author: A. Srinivasan
A. Whittaker
C. Rawlings
C.H. Bryant
Ling C.
S. Topp
S.H. Muggleton
Searls D.
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/10/2001
Field of study

This paper investigates whether Chomsky-like grammar representations are useful for learning cost-effective, comprehensible predictors of members of biological sequence families. The Inductive Logic Programming (ILP) Bayesian approach to learning from positive examples is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Collectively, five of the co-authors of this paper, have extensive expertise on NPPs and general bioinformatics methods. Their motivation for generating a NPP grammar was that none of the existing bioinformatics methods could provide sufficient cost-savings during the search for new NPPs. Prior to this project experienced specialists at SmithKline Beecham had tried for many months to hand-code such a grammar but without success. Our best predictor makes the search for novel NPPs more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. As far as these authors are aware, this is both the first biological grammar learnt using ILP and the first real-world scientific application of the ILP Bayesian approach to learning from positive examples. A group of features is derived from this grammar. Other groups of features of NPPs are derived using other learning strategies. Amalgams of these groups are formed. A recognition model is generated for each amalgam using C4.5 and C4.5rules and its performance is measured using both predictive accuracy and a new cost function, Relative Advantage (RA). The highest RA was achieved by a model which includes grammar-derived features. This RA is significantly higher than the best RA achieved without the use of the grammar-derived features. Predictive accuracy is not a good measure of performance for this domain because it does not discriminate well between NPP recognition models: despite covering varying numbers of (the rare) positives, all the models are awarded a similar (high) score by predictive accuracy because they all exclude most of the abundant negatives

University of Salford Institutional Repository

Crossref

Hybrid Modeling, HMM/NN Architectures, and Protein Applications

Author: Baldi P.
Dempster A. P.
Lapedes A.
Myers E. W.
Pierre Baldi
Searls D.
Yves Chauvin
Publication venue: 'MIT Press - Journals'
Publication date
Field of study

Crossref

Applications of Generalized Pair Hidden Markov Models to Alignment and Gene Finding Problems

Author: Dayhoff M.O.
Korf I.
Kulp D.
Lior Pachter
Marina Alexandersson
Müller T.
Searls D.B.
Simon Cawley
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Crossref

Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

Author: BFJ Manly
CI Castillo-Davis
David Johnson
DB Searls
DB Searls
DD Womble
E Badidi
F Antequera
J Krueger
J Theilhaber
JD Wren
JD Wren
JF Costello
JM Claverie
Jonathan D Wren
JR Quinlan
K Davies
K Nakai
L Stein
Le Gruenwald
LV Zhang
M Ashburner
M Gardiner-Garden
M Safran
P Clark
RS Michalski
S Foissac
S Muggleton
SP Shah
TV Venkatesh
V Bajic
W Frawley
WM Shui
WM Shui
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central