319,070 research outputs found
MEMOFinder: combining _de_ _novo_ motif prediction methods with a database of known motifs
*Background:* Methods for finding overrepresented sequence motifs are useful in several key areas of computational biology. They aim at detecting very weak signals responsible for biological processes requiring robust sequence identification like transcription-factor binding to DNA or docking sites in proteins. Currently, general performance of the model-based motif-finding methods is unsatisfactory; however, different methods are successful in different cases. This leads to the practical problem of combining results of different motif-finding tools, taking into account current knowledge collected in motif databases.
*Results:* We propose a new complete service allowing researchers to submit their sequences for analysis by four different motif-finding methods for clustering and comparison with a reference motif database. It is tailored for regulatory motif detection, however it allows for substantial amount of configuration regarding sequence background, motif database and parameters for motif-finding methods.
*Availability:* The method is available online as a webserver at: http://bioputer.mimuw.edu.pl/software/mmf/. In addition, the source code is released on a GNU General Public License
Novel Algorithms for LDD Motif Search
Background: Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Several motif models have been proposed in the literature. The (l,d)-motif model is one of these that has been studied widely. However, this model will sometimes report too many spurious motifs than expected. We interpret a motif as a biologically significant entity that is evolutionarily preserved within some distance. It may be highly improbable that the motif undergoes the same number of changes in each of the species. To address this issue, in this paper, we introduce a new model which is more general than (l,d)-motif model. This model is called (l,d1,d2)-motif model (LDDMS) and is NP-hard as well. We present three elegant as well as efficient algorithms to solve the LDDMS problem, i.e., LDDMS1, LDDMS2 and LDDMS3. They are all exact algorithms. Results: We did both theoretical analyses and empirical tests on these algorithms. Theoretical analyses demonstrate that our algorithms have less computational cost than the pattern driven approach. Empirical results on both simulated datasets and real datasets show that each of the three algorithms has some advantages on some (l,d1,d2) instances. Conclusions: We proposed LDDMS model which is more practically relevant. We also proposed three exact efficient algorithms to solve the problem. Besides, our algorithms can be nicely parallelized. We believe that the idea in this new model can also be extended to other motif search problems such as Edit-distance-based Motif Search (EMS) and Simple Motif Search (SMS)
Transcription Factor-DNA Binding Via Machine Learning Ensembles
We present ensemble methods in a machine learning (ML) framework combining
predictions from five known motif/binding site exploration algorithms. For a
given TF the ensemble starts with position weight matrices (PWM's) for the
motif, collected from the component algorithms. Using dimension reduction, we
identify significant PWM-based subspaces for analysis. Within each subspace a
machine classifier is built for identifying the TF's gene (promoter) targets
(Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool.
Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string)
feature PWM-based subspaces that stand out in identifying gene targets. We
approach Problem 3 (binding sites) with a novel machine learning approach that
uses promoter string features and ML importance scores in a classification
algorithm locating binding sites across the genome. For target gene
identification this method improves performance (measured by the F1 score) by
about 10 percentage points over the (a) motif scanning method and (b) the
coexpression-based association method. Top motif outperformed 5 component
algorithms as well as two other common algorithms (BEST and DEME). For
identifying individual binding sites on a benchmark cross species database
(Tompa et al., 2005) we match the best performer without much human
intervention. It also improved the performance on mammalian TFs.
The ensemble can integrate orthogonal information from different weak
learners (potentially using entirely different types of features) into a
machine learner that can perform consistently better for more TFs. The TF gene
target identification component (problem 1 above) is useful in constructing a
transcriptional regulatory network from known TF-target associations. The
ensemble is easily extendable to include more tools as well as future PWM-based
information.Comment: 33 page
motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences
Next-generation sequencing technology enables the identification of thousands
of gene regulatory sequences in many cell types and organisms. We consider the
problem of testing if two such sequences differ in their number of binding site
motifs for a given transcription factor (TF) protein. Binding site motifs
impart regulatory function by providing TFs the opportunity to bind to genomic
elements and thereby affect the expression of nearby genes. Evolutionary
changes to such functional DNA are hypothesized to be major contributors to
phenotypic diversity within and between species; but despite the importance of
TF motifs for gene expression, no method exists to test for motif loss or gain.
Assuming that motif counts are Binomially distributed, and allowing for
dependencies between motif instances in evolutionarily related sequences, we
derive the probability mass function of the difference in motif counts between
two nucleotide sequences. We provide a method to numerically estimate this
distribution from genomic data and show through simulations that our estimator
is accurate. Finally, we introduce the R package {\tt motifDiverge} that
implements our methodology and illustrate its application to gene regulatory
enhancers identified by a mouse developmental time course experiment. While
this study was motivated by analysis of regulatory motifs, our results can be
applied to any problem involving two correlated Bernoulli trials
Identification of the catalytic motif of the microbial ribosome inactivating cytotoxin colicin E3
Colicin E3 is a cytotoxic ribonuclease that specifically cleaves 16S rRNA at the ribosomal A-site to abolish protein synthesis in sensitive Escherichia coli cells. We have performed extensive mutagenesis of the 96-residue colicin E3 cytotoxic domain (E3 rRNase), assayed mutant colicins for in vivo cytotoxicity, and tested the corresponding E3 rRNase domains for their ability to inactivate ribosome function in vitro. From 21 alanine mutants, we identified five positions where mutation resulted in a colicin with no measurable cytotoxicity (Y52, D55, H58, E62, and Y64) and four positions (R40, R42, E60, and R90) where mutation caused a significant reduction in cytotoxicity. Mutations that were found to have large in vivo and in vitro effects were tested for structural integrity through circular dichroism and fluorescence spectroscopy using purified rRNase domains. Our data indicate that H58 and E62 likely act as the acid–base pair during catalysis with other residues likely involved in transition state stabilization. Both the Y52 and Y64 mutants were found to be highly destabilized and this is the likely origin of the loss of their cytotoxicity. The identification of important active site residues and sequence alignments of known rRNase homologs has allowed us to identify other proteins containing the putative rRNase active site motif. Proteins that contained this active site motif included three hemagglutinin-type adhesins and we speculate that these have evolved to deliver a cytotoxic rRNase into eukaryotic cells during pathogenesis
MOTIFATOR: detection and characterization of regulatory motifs using prokaryote transcriptome data
Summary: Unraveling regulatory mechanisms (e.g. identification of motifs in cis-regulatory regions) remains a major challenge in the analysis of transcriptome experiments. Existing applications identify putative motifs from gene lists obtained at rather arbitrary cutoff and require additional manual processing steps. Our standalone application MOTIFATOR identifies the most optimal parameters for motif discovery and creates an interactive visualization of the results. Discovered putative motifs are functionally characterized, thereby providing valuable insight in the biological processes that could be controlled by the motif.
iLIR : a web resource for prediction of Atg8-family interacting proteins
Macroautophagy was initially considered to be a nonselective process for bulk breakdown of cytosolic material. However, recent evidence points toward a selective mode of autophagy mediated by the so-called selective autophagy receptors (SARs). SARs act by recognizing and sorting diverse cargo substrates (e.g., proteins, organelles, pathogens) to the autophagic machinery. Known SARs are characterized by a short linear sequence motif (LIR-, LRS-, or AIM-motif) responsible for the interaction between SARs and proteins of the Atg8 family. Interestingly, many LIR-containing proteins (LIRCPs) are also involved in autophagosome formation and maturation and a few of them in regulating signaling pathways. Despite recent research efforts to experimentally identify LIRCPs, only a few dozen of this class of—often unrelated—proteins have been characterized so far using tedious cell biological, biochemical, and crystallographic approaches. The availability of an ever-increasing number of complete eukaryotic genomes provides a grand challenge for characterizing novel LIRCPs throughout the eukaryotes. Along these lines, we developed iLIR, a freely available web resource, which provides in silico tools for assisting the identification of novel LIRCPs. Given an amino acid sequence as input, iLIR searches for instances of short sequences compliant with a refined sensitive regular expression pattern of the extended LIR motif (xLIR-motif) and retrieves characterized protein domains from the SMART database for the query. Additionally, iLIR scores xLIRs against a custom position-specific scoring matrix (PSSM) and identifies potentially disordered subsequences with protein interaction potential overlapping with detected xLIR-motifs. Here we demonstrate that proteins satisfying these criteria make good LIRCP candidates for further experimental verification. Domain architecture is displayed in an informative graphic, and detailed results are also available in tabular form. We anticipate that iLIR will assist with elucidating the full complement of LIRCPs in eukaryotes
Identification and functional analysis of novel phosphorylation sites in the RNA surveillance protein Upf1.
One third of inherited genetic diseases are caused by mRNAs harboring premature termination codons as a result of nonsense mutations. These aberrant mRNAs are degraded by the Nonsense-Mediated mRNA Decay (NMD) pathway. A central component of the NMD pathway is Upf1, an RNA-dependent ATPase and helicase. Upf1 is a known phosphorylated protein, but only portions of this large protein have been examined for phosphorylation sites and the functional relevance of its phosphorylation has not been elucidated in Saccharomyces cerevisiae. Using tandem mass spectrometry analyses, we report the identification of 11 putative phosphorylated sites in S. cerevisiae Upf1. Five of these phosphorylated residues are located within the ATPase and helicase domains and are conserved in higher eukaryotes, suggesting a biological significance for their phosphorylation. Indeed, functional analysis demonstrated that a small carboxy-terminal motif harboring at least three phosphorylated amino acids is important for three Upf1 functions: ATPase activity, NMD activity and the ability to promote translation termination efficiency. We provide evidence that two tyrosines within this phospho-motif (Y-738 and Y-742) act redundantly to promote ATP hydrolysis, NMD efficiency and translation termination fidelity
- …
