9,403 research outputs found

    Regulatory motif discovery using a population clustering evolutionary algorithm

    Get PDF
    This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences

    DLocalMotif: a discriminative approach for discovering local motifs in protein sequences

    Get PDF
    Motivation: Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery

    A survey of DNA motif finding algorithms

    Get PDF
    Background: Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms.Results: Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms.Conclusion: Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of the biology of regulatory mechanism does not always provide adequate evaluation of underlying algorithms over motif models.Peer reviewedComputer Scienc

    Automated linear motif discovery from protein interaction network

    Get PDF
    Master'sMASTER OF SCIENC

    A methodology for determining amino-acid substitution matrices from set covers

    Full text link
    We introduce a new methodology for the determination of amino-acid substitution matrices for use in the alignment of proteins. The new methodology is based on a pre-existing set cover on the set of residues and on the undirected graph that describes residue exchangeability given the set cover. For fixed functional forms indicating how to obtain edge weights from the set cover and, after that, substitution-matrix elements from weighted distances on the graph, the resulting substitution matrix can be checked for performance against some known set of reference alignments and for given gap costs. Finding the appropriate functional forms and gap costs can then be formulated as an optimization problem that seeks to maximize the performance of the substitution matrix on the reference alignment set. We give computational results on the BAliBASE suite using a genetic algorithm for optimization. Our results indicate that it is possible to obtain substitution matrices whose performance is either comparable to or surpasses that of several others, depending on the particular scenario under consideration

    iLIR : a web resource for prediction of Atg8-family interacting proteins

    Get PDF
    Macroautophagy was initially considered to be a nonselective process for bulk breakdown of cytosolic material. However, recent evidence points toward a selective mode of autophagy mediated by the so-called selective autophagy receptors (SARs). SARs act by recognizing and sorting diverse cargo substrates (e.g., proteins, organelles, pathogens) to the autophagic machinery. Known SARs are characterized by a short linear sequence motif (LIR-, LRS-, or AIM-motif) responsible for the interaction between SARs and proteins of the Atg8 family. Interestingly, many LIR-containing proteins (LIRCPs) are also involved in autophagosome formation and maturation and a few of them in regulating signaling pathways. Despite recent research efforts to experimentally identify LIRCPs, only a few dozen of this class of—often unrelated—proteins have been characterized so far using tedious cell biological, biochemical, and crystallographic approaches. The availability of an ever-increasing number of complete eukaryotic genomes provides a grand challenge for characterizing novel LIRCPs throughout the eukaryotes. Along these lines, we developed iLIR, a freely available web resource, which provides in silico tools for assisting the identification of novel LIRCPs. Given an amino acid sequence as input, iLIR searches for instances of short sequences compliant with a refined sensitive regular expression pattern of the extended LIR motif (xLIR-motif) and retrieves characterized protein domains from the SMART database for the query. Additionally, iLIR scores xLIRs against a custom position-specific scoring matrix (PSSM) and identifies potentially disordered subsequences with protein interaction potential overlapping with detected xLIR-motifs. Here we demonstrate that proteins satisfying these criteria make good LIRCP candidates for further experimental verification. Domain architecture is displayed in an informative graphic, and detailed results are also available in tabular form. We anticipate that iLIR will assist with elucidating the full complement of LIRCPs in eukaryotes
    corecore