8,874 research outputs found
Fast motif recognition via application of statistical thresholds
Background: Improving the accuracy and efficiency of motif recognition is an important computational challenge that has application to detecting transcription factor binding sites in genomic data. Closely related to motif recognition is the Consensus String decision problem that asks, given a parameter d and a set of â„“-length strings S = {s1,...,sn}, whether there exists a consensus string that has Hamming distance at most d from any string in S. A set of strings S is pairwise bounded if the Hamming distance between any pair of strings in S is at most 2d. It is trivial to determine whether a set is pairwise bounded, and a set cannot have a consensus string unless it is pairwise bounded. We use Consensus String to determine whether or not a pairwise bounded set has a consensus. Unfortunately, Consensus String is NP-complete. The lack of an efficient method to solve the Consensus String problem has caused it to become a computational bottleneck in MCL-WMR, a motif recognition program capable of solving difficult motif recognition problem instances. Results: We focus on the development of a method for solving Consensus String quickly with a small probability of error. We apply this heuristic to develop a new motif recognition program, sMCL-WMR, which has impressive accuracy and efficiency. We demonstrate the performance of sMCL-WMR in detecting weak motifs in large data sets and in real genomic data sets, and compare the performance to other leading motif recognitio
Recommended from our members
PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures.
Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions
Comprehensive structural classification of ligand binding motifs in proteins
Comprehensive knowledge of protein-ligand interactions should provide a
useful basis for annotating protein functions, studying protein evolution,
engineering enzymatic activity, and designing drugs. To investigate the
diversity and universality of ligand binding sites in protein structures, we
conducted the all-against-all atomic-level structural comparison of over
180,000 ligand binding sites found in all the known structures in the Protein
Data Bank by using a recently developed database search and alignment
algorithm. By applying a hybrid top-down-bottom-up clustering analysis to the
comparison results, we determined approximately 3000 well-defined structural
motifs of ligand binding sites. Apart from a handful of exceptions, most
structural motifs were found to be confined within single families or
superfamilies, and to be associated with particular ligands. Furthermore, we
analyzed the components of the similarity network and enumerated more than 4000
pairs of ligand binding sites that were shared across different protein folds.Comment: 13 pages, 8 figure
Spectral Sequence Motif Discovery
Sequence discovery tools play a central role in several fields of
computational biology. In the framework of Transcription Factor binding
studies, motif finding algorithms of increasingly high performance are required
to process the big datasets produced by new high-throughput sequencing
technologies. Most existing algorithms are computationally demanding and often
cannot support the large size of new experimental data. We present a new motif
discovery algorithm that is built on a recent machine learning technique,
referred to as Method of Moments. Based on spectral decompositions, this method
is robust under model misspecification and is not prone to locally optimal
solutions. We obtain an algorithm that is extremely fast and designed for the
analysis of big sequencing data. In a few minutes, we can process datasets of
hundreds of thousand sequences and extract motif profiles that match those
computed by various state-of-the-art algorithms.Comment: 20 pages, 3 figures, 1 tabl
Recommended from our members
Common CHD8 Genomic Targets Contrast With Model-Specific Transcriptional Impacts of CHD8 Haploinsufficiency.
The packaging of DNA into chromatin determines the transcriptional potential of cells and is central to eukaryotic gene regulation. Case sequencing studies have revealed mutations to proteins that regulate chromatin state, known as chromatin remodeling factors, with causal roles in neurodevelopmental disorders. Chromodomain helicase DNA binding protein 8 (CHD8) encodes a chromatin remodeling factor with among the highest de novo loss-of-function mutation rates in patients with autism spectrum disorder (ASD). However, mechanisms associated with CHD8 pathology have yet to be elucidated. We analyzed published transcriptomic data across CHD8 in vitro and in vivo knockdown and knockout models and CHD8 binding across published ChIP-seq datasets to identify convergent mechanisms of gene regulation by CHD8. Differentially expressed genes (DEGs) across models varied, but overlap was observed between downregulated genes involved in neuronal development and function, cell cycle, chromatin dynamics, and RNA processing, and between upregulated genes involved in metabolism and immune response. Considering the variability in transcriptional changes and the cells and tissues represented across ChIP-seq analysis, we found a surprisingly consistent set of high-affinity CHD8 genomic interactions. CHD8 was enriched near promoters of genes involved in basic cell functions and gene regulation. Overlap between high-affinity CHD8 targets and DEGs shows that reduced dosage of CHD8 directly relates to decreased expression of cell cycle, chromatin organization, and RNA processing genes, but only in a subset of studies. This meta-analysis verifies CHD8 as a master regulator of gene expression and reveals a consistent set of high-affinity CHD8 targets across human, mouse, and rat in vivo and in vitro studies. These conserved regulatory targets include many genes that are also implicated in ASD. Our findings suggest a model where perturbation to dosage-sensitive CHD8 genomic interactions with a highly-conserved set of regulatory targets leads to model-specific downstream transcriptional impacts
Hardware design of LIF with Latency neuron model with memristive STDP synapses
In this paper, the hardware implementation of a neuromorphic system is
presented. This system is composed of a Leaky Integrate-and-Fire with Latency
(LIFL) neuron and a Spike-Timing Dependent Plasticity (STDP) synapse. LIFL
neuron model allows to encode more information than the common
Integrate-and-Fire models, typically considered for neuromorphic
implementations. In our system LIFL neuron is implemented using CMOS circuits
while memristor is used for the implementation of the STDP synapse. A
description of the entire circuit is provided. Finally, the capabilities of the
proposed architecture have been evaluated by simulating a motif composed of
three neurons and two synapses. The simulation results confirm the validity of
the proposed system and its suitability for the design of more complex spiking
neural network
Text authorship identified using the dynamics of word co-occurrence networks
The identification of authorship in disputed documents still requires human
expertise, which is now unfeasible for many tasks owing to the large volumes of
text and authors in practical applications. In this study, we introduce a
methodology based on the dynamics of word co-occurrence networks representing
written texts to classify a corpus of 80 texts by 8 authors. The texts were
divided into sections with equal number of linguistic tokens, from which time
series were created for 12 topological metrics. The series were proven to be
stationary (p-value>0.05), which permits to use distribution moments as
learning attributes. With an optimized supervised learning procedure using a
Radial Basis Function Network, 68 out of 80 texts were correctly classified,
i.e. a remarkable 85% author matching success rate. Therefore, fluctuations in
purely dynamic network metrics were found to characterize authorship, thus
opening the way for the description of texts in terms of small evolving
networks. Moreover, the approach introduced allows for comparison of texts with
diverse characteristics in a simple, fast fashion
- …