Search CORE

12,244 research outputs found

Learning a Hybrid Architecture for Sequence Regression and Annotation

Author: Carin Lawrence
Hartemink Alexander J.
Henao Ricardo
Zhang Yizhe
Zhong Jianling
Publication venue
Publication date: 16/12/2015
Field of study

When learning a hidden Markov model (HMM), sequen- tial observations can often be complemented by real-valued summary response variables generated from the path of hid- den states. Such settings arise in numerous domains, includ- ing many applications in biology, like motif discovery and genome annotation. In this paper, we present a flexible frame- work for jointly modeling both latent sequence features and the functional mapping that relates the summary response variables to the hidden state sequence. The algorithm is com- patible with a rich set of mapping functions. Results show that the availability of additional continuous response vari- ables can simultaneously improve the annotation of the se- quential observations and yield good prediction performance in both synthetic data and real-world datasets.Comment: AAAI 201

arXiv.org e-Print Archive

DukeSpace

Association for the Advancement of Artificial Intelligence: AAAI Publications

An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs

Author: A Sandelin
A Sandelin
A Sharov
A Tomovic
Adrian J Shepherd
Armando Blanco
C Lawrence
D Denning
E Baker
E Szmidt
E Wingender
F Garcia
F Lam
F Lopez
F Offner
F Zare-Mirakabad
Fernando Garcia-Alcalde
G Chamilos
G Diop
G Hertz
J Hanley
J Hughes
J Sainz
J Van Helden
J Zhao
K Atanassov
K Atanassov
K Atanassov
K Atanassov
K Won
L Liang
L Zadeh
M Bulyk
M Das
M Eisen
N Dror
N Kim
P Benos
P Bochud
P Schling
R Gordan
S De
T Bailey
T Fawcett
T Hehlgans
T Tamura
T Tamura
V Khatibi
W Hung
W Wasserman
X Chen
Y Haudry
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Background: Transcription factors (TFs) control transcription by binding to specific regions of DNA called transcription factor binding sites (TFBSs). The identification of TFBSs is a crucial problem in computational biology and includes the subtask of predicting the location of known TFBS motifs in a given DNA sequence. It has previously been shown that, when scoring matches to known TFBS motifs, interdependencies between positions within a motif should be taken into account. However, this remains a challenging task owing to the fact that sequences similar to those of known TFBSs can occur by chance with a relatively high frequency. Here we present a new method for matching sequences to TFBS motifs based on intuitionistic fuzzy sets (IFS) theory, an approach that has been shown to be particularly appropriate for tackling problems that embody a high degree of uncertainty. Results: We propose SCintuit, a new scoring method for measuring sequence-motif affinity based on IFS theory. Unlike existing methods that consider dependencies between positions, SCintuit is designed to prevent overestimation of less conserved positions of TFBSs. For a given pair of bases, SCintuit is computed not only as a function of their combined probability of occurrence, but also taking into account the individual importance of each single base at its corresponding position. We used SCintuit to identify known TFBSs in DNA sequences. Our method provides excellent results when dealing with both synthetic and real data, outperforming the sensitivity and the specificity of two existing methods in all the experiments we performed. Conclusions: The results show that SCintuit improves the prediction quality for TFs of the existing approaches without compromising sensitivity. In addition, we show how SCintuit can be successfully applied to real research problems. In this study the reliability of the IFS theory for motif discovery tasks is proven

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery

Repositorio Institucional Universidad de Granada

Birkbeck Institutional Research Online

BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

Author: Ari Eszter
Horváth Arnold
Ittzés Péter
Jakó Éena
Podani János
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

Crossref

Repository of the Academy's Library

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

Author: Abeel
Afflerbach
Angarica
Bailey
Bart Hooghe
Bauer
Benos
Breiman
Bulyk
Burden
Calladine
Camenisch
Chen
Cho
Cordell
Davis
Dickerson
Ehret
Ernst
Frans van Roy
Friedel
Fujii
Fulton
Gama-Castro
Gardiner
Gartenberg
Gershenzon
Goodsell
Gorin
Gowrisankar
Greenbaum
Gunewardena
Hall
Hendrickson
Hu
Juo
Kajimura
Kaplan
Karas
Kel
Kim
Lavery
Lewis
Liu
Liu
Liu
Long
Lu
Lu
Lu
Lunetta
Man
Marco
Marinescu
Martinez-Hackert
Matys
Medina-Rivera
Meysman
Michel
Mokry
Morozov
Narang
Naughton
O'Flanagan
Olson
Paillard
Pan
Parker
Parvin
Pieter De Bleser
Ponomarenko
Portales-Casamar
Powell
Pudimat
Ramsey
Rohs
Rohs
Rohs
Ruiz
Satchwell
Schneider
Shakked
Sharon
Shi
Spolar
Stefan Broos
Stormo
Svozil
Thayer
Tomovic
Toro-Roman
Travers
Tullius
Wunderlich
Zhang
Zhang
Zhu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding

Crossref

Ghent University Academic Bibliography

PubMed Central

ASAP-SML: An Antibody Sequence Analysis Pipeline Using Statistical Testing and Machine Learning

Author: Hassoun Soha
Li Xinmeng
Van Deventer James A.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/03/2020
Field of study

Antibodies are capable of potently and specifically binding individual antigens and, in some cases, disrupting their functions. The key challenge in generating antibody-based inhibitors is the lack of fundamental information relating sequences of antibodies to their unique properties as inhibitors. We develop a pipeline, Antibody Sequence Analysis Pipeline using Statistical testing and Machine Learning (ASAP-SML), to identify features that distinguish one set of antibody sequences from antibody sequences in a reference set. The pipeline extracts feature fingerprints from sequences. The fingerprints represent germline, CDR canonical structure, isoelectric point and frequent positional motifs. Machine learning and statistical significance testing techniques are applied to antibody sequences and extracted feature fingerprints to identify distinguishing feature values and combinations thereof. To demonstrate how it works, we applied the pipeline on sets of antibody sequences known to bind or inhibit the activities of matrix metalloproteinases (MMPs), a family of zinc-dependent enzymes that promote cancer progression and undesired inflammation under pathological conditions, against reference datasets that do not bind or inhibit MMPs. ASAP-SML identifies features and combinations of feature values found in the MMP-targeting sets that are distinct from those in the reference sets

arXiv.org e-Print Archive

Directory of Open Access Journals

A Neural Model of How the Brain Computes Heading from Optic Flow in Realistic Scenes

Author: Browing Andrew N.
Grossberg Stephen
Mingolla Ennio
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/12/2008
Field of study

Animals avoid obstacles and approach goals in novel cluttered environments using visual information, notably optic flow, to compute heading, or direction of travel, with respect to objects in the environment. We present a neural model of how heading is computed that describes interactions among neurons in several visual areas of the primate magnocellular pathway, from retina through V1, MT+, and MSTd. The model produces outputs which are qualitatively and quantitatively similar to human heading estimation data in response to complex natural scenes. The model estimates heading to within 1.5° in random dot or photo-realistically rendered scenes and within 3° in video streams from driving in real-world environments. Simulated rotations of less than 1 degree per second do not affect model performance, but faster simulated rotation rates deteriorate performance, as in humans. The model is part of a larger navigational system that identifies and tracks objects while navigating in cluttered environments.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National-Geospatial Intelligence Agency (NMA201-01-1-2016

Boston University Institutional Repository (OpenBU)

Cortical Dynamics of Navigation and Steering in Natural Scenes: Motion-Based Object Segmentation, Heading, and Obstacle Avoidance

Author: Browning Andrew N.
Grossberg Stephen
Mingolla Ennio
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/12/2008
Field of study

Visually guided navigation through a cluttered natural scene is a challenging problem that animals and humans accomplish with ease. The ViSTARS neural model proposes how primates use motion information to segment objects and determine heading for purposes of goal approach and obstacle avoidance in response to video inputs from real and virtual environments. The model produces trajectories similar to those of human navigators. It does so by predicting how computationally complementary processes in cortical areas MT-/MSTv and MT+/MSTd compute object motion for tracking and self-motion for navigation, respectively. The model retina responds to transients in the input stream. Model V1 generates a local speed and direction estimate. This local motion estimate is ambiguous due to the neural aperture problem. Model MT+ interacts with MSTd via an attentive feedback loop to compute accurate heading estimates in MSTd that quantitatively simulate properties of human heading estimation data. Model MT interacts with MSTv via an attentive feedback loop to compute accurate estimates of speed, direction and position of moving objects. This object information is combined with heading information to produce steering decisions wherein goals behave like attractors and obstacles behave like repellers. These steering decisions lead to navigational trajectories that closely match human performance.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National Geospatial Intelligence Agency (NMA201-01-1-2016

Boston University Institutional Repository (OpenBU)