Search CORE

8,874 research outputs found

Fast motif recognition via application of statistical thresholds

Author: C Boucher
C Boucher
Christina Boucher
E Eskin
E Wingender
FYL Chin
FYL Chin
G Pavesi
I Ben-Gal
J Buhler
J Davila
James King
M Frances
M Li
M Tompa
MC Frith
N Pisanti
P Pevzner
PA Evans
S Rajasekaran
S Sze
S van Dongen
TL Bailey
WS Feng
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Improving the accuracy and efficiency of motif recognition is an important computational challenge that has application to detecting transcription factor binding sites in genomic data. Closely related to motif recognition is the Consensus String decision problem that asks, given a parameter d and a set of ℓ-length strings S = {s1,...,sn}, whether there exists a consensus string that has Hamming distance at most d from any string in S. A set of strings S is pairwise bounded if the Hamming distance between any pair of strings in S is at most 2d. It is trivial to determine whether a set is pairwise bounded, and a set cannot have a consensus string unless it is pairwise bounded. We use Consensus String to determine whether or not a pairwise bounded set has a consensus. Unfortunately, Consensus String is NP-complete. The lack of an efficient method to solve the Consensus String problem has caused it to become a computational bottleneck in MCL-WMR, a motif recognition program capable of solving difficult motif recognition problem instances. Results: We focus on the development of a method for solving Consensus String quickly with a small probability of error. We apply this heuristic to develop a new motif recognition program, sMCL-WMR, which has impressive accuracy and efficiency. We demonstrate the performance of sMCL-WMR in detecting weak motifs in large data sets and in real genomic data sets, and compare the performance to other leading motif recognitio

CiteSeerX

Springer - Publisher Connector

Recommended from our members

PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures.

Author: Aviran Sharon
Ledda Mirko
Publication venue: eScholarship, University of California
Publication date: 01/03/2018
Field of study

Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions

eScholarship - University of California

Comprehensive structural classification of ligand binding motifs in proteins

Author: Akira R. Kinjo
Altschul
Andreeva
Bachhawat
Barber
Berman
Berry
Beuth
Brakoulias
Carvalho
Chen
Davies
Diamond
Dias
Du
Dunn
Friedberg
Garcia-Molina
Gold
Goldstein
Gonzalez
Grishin
Grishin
Gross
Guilloteau
Gutteridge
Haruki Nakamura
Herter
Hoff
Ikura
Jonassen
Jones
Kawabata
Kawabata
Kinjo
Kinoshita
Kinoshita
Kobayashi
Kolodny
Krishna
Krishna
Krissinel
Lang
Laronde-Leblanc
Lawler
Lee
Malikayil
Minai
Murzin
Nagano
Orengo
Pattabhi
Polacco
Porter
Ridder
Rognan
Russell
Schubert
Shulman-Peleg
Standley
Stark
Stewart
Stoll
Tari
Tari
Taylor
Wallace
Wangikar
Watts
Westbrook
Whitlow
Wolfson
Xiao
Xie
Publication venue: 'Elsevier BV'
Publication date: 07/10/2008
Field of study

Comprehensive knowledge of protein-ligand interactions should provide a useful basis for annotating protein functions, studying protein evolution, engineering enzymatic activity, and designing drugs. To investigate the diversity and universality of ligand binding sites in protein structures, we conducted the all-against-all atomic-level structural comparison of over 180,000 ligand binding sites found in all the known structures in the Protein Data Bank by using a recently developed database search and alignment algorithm. By applying a hybrid top-down-bottom-up clustering analysis to the comparison results, we determined approximately 3000 well-defined structural motifs of ligand binding sites. Apart from a handful of exceptions, most structural motifs were found to be confined within single families or superfamilies, and to be associated with particular ligands. Furthermore, we analyzed the components of the similarity network and enumerated more than 4000 pairs of ligand binding sites that were shared across different protein folds.Comment: 13 pages, 8 figure

arXiv.org e-Print Archive

Spectral Sequence Motif Discovery

Author: Colombo Nicolò
Vlassis Nikos
Publication venue
Publication date: 01/01/2014
Field of study

Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, motif finding algorithms of increasingly high performance are required to process the big datasets produced by new high-throughput sequencing technologies. Most existing algorithms are computationally demanding and often cannot support the large size of new experimental data. We present a new motif discovery algorithm that is built on a recent machine learning technique, referred to as Method of Moments. Based on spectral decompositions, this method is robust under model misspecification and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. In a few minutes, we can process datasets of hundreds of thousand sequences and extract motif profiles that match those computed by various state-of-the-art algorithms.Comment: 20 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Common CHD8 Genomic Targets Contrast With Model-Specific Transcriptional Impacts of CHD8 Haploinsufficiency.

Author: Catta-Preta Rinaldo
Lim Kenneth
Nord Alex S
Wade A Ayanna
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

The packaging of DNA into chromatin determines the transcriptional potential of cells and is central to eukaryotic gene regulation. Case sequencing studies have revealed mutations to proteins that regulate chromatin state, known as chromatin remodeling factors, with causal roles in neurodevelopmental disorders. Chromodomain helicase DNA binding protein 8 (CHD8) encodes a chromatin remodeling factor with among the highest de novo loss-of-function mutation rates in patients with autism spectrum disorder (ASD). However, mechanisms associated with CHD8 pathology have yet to be elucidated. We analyzed published transcriptomic data across CHD8 in vitro and in vivo knockdown and knockout models and CHD8 binding across published ChIP-seq datasets to identify convergent mechanisms of gene regulation by CHD8. Differentially expressed genes (DEGs) across models varied, but overlap was observed between downregulated genes involved in neuronal development and function, cell cycle, chromatin dynamics, and RNA processing, and between upregulated genes involved in metabolism and immune response. Considering the variability in transcriptional changes and the cells and tissues represented across ChIP-seq analysis, we found a surprisingly consistent set of high-affinity CHD8 genomic interactions. CHD8 was enriched near promoters of genes involved in basic cell functions and gene regulation. Overlap between high-affinity CHD8 targets and DEGs shows that reduced dosage of CHD8 directly relates to decreased expression of cell cycle, chromatin organization, and RNA processing genes, but only in a subset of studies. This meta-analysis verifies CHD8 as a master regulator of gene expression and reveals a consistent set of high-affinity CHD8 targets across human, mouse, and rat in vivo and in vitro studies. These conserved regulatory targets include many genes that are also implicated in ASD. Our findings suggest a model where perturbation to dosage-sensitive CHD8 genomic interactions with a highly-conserved set of regulatory targets leads to model-specific downstream transcriptional impacts

eScholarship - University of California

FigShare

Hardware design of LIF with Latency neuron model with memristive STDP synapses

Author: Acciarito Simone
Cardarilli Gian Carlo
Cristini Alessandro
Di Nunzio Luca
Fazzolari Rocco
Khanal Gaurav Mani
Re Marco
Susi Gianluca
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

In this paper, the hardware implementation of a neuromorphic system is presented. This system is composed of a Leaky Integrate-and-Fire with Latency (LIFL) neuron and a Spike-Timing Dependent Plasticity (STDP) synapse. LIFL neuron model allows to encode more information than the common Integrate-and-Fire models, typically considered for neuromorphic implementations. In our system LIFL neuron is implemented using CMOS circuits while memristor is used for the implementation of the STDP synapse. A description of the entire circuit is provided. Finally, the capabilities of the proposed architecture have been evaluated by simulating a motif composed of three neurons and two synapses. The simulation results confirm the validity of the proposed system and its suitability for the design of more complex spiking neural network

arXiv.org e-Print Archive

Text authorship identified using the dynamics of word co-occurrence networks

Author: Akimushkin Camilo
Amancio Diego R.
Oliveira Jr Osvaldo N.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 29/07/2016
Field of study

The identification of authorship in disputed documents still requires human expertise, which is now unfeasible for many tasks owing to the large volumes of text and authors in practical applications. In this study, we introduce a methodology based on the dynamics of word co-occurrence networks representing written texts to classify a corpus of 80 texts by 8 authors. The texts were divided into sections with equal number of linguistic tokens, from which time series were created for 12 topological metrics. The series were proven to be stationary (p-value>0.05), which permits to use distribution moments as learning attributes. With an optimized supervised learning procedure using a Radial Basis Function Network, 68 out of 80 texts were correctly classified, i.e. a remarkable 85% author matching success rate. Therefore, fluctuations in purely dynamic network metrics were found to characterize authorship, thus opening the way for the description of texts in terms of small evolving networks. Moreover, the approach introduced allows for comparison of texts with diverse characteristics in a simple, fast fashion

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare