41,775 research outputs found

    Regulatory motif discovery using a population clustering evolutionary algorithm

    Get PDF
    This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences

    Computational identification and analysis of noncoding RNAs - Unearthing the buried treasures in the genome

    Get PDF
    The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein. This dogma has exerted a substantial influence on our understanding of the genetic activities in the cells. Under this influence, the prevailing assumption until the recent past was that genes are basically repositories for protein coding information, and proteins are responsible for most of the important biological functions in all cells. In the meanwhile, the importance of RNAs has remained rather obscure, and RNA was mainly viewed as a passive intermediary that bridges the gap between DNA and protein. Except for classic examples such as tRNAs (transfer RNAs) and rRNAs (ribosomal RNAs), functional noncoding RNAs were considered to be rare. However, this view has experienced a dramatic change during the last decade, as systematic screening of various genomes identified myriads of noncoding RNAs (ncRNAs), which are RNA molecules that function without being translated into proteins [11], [40]. It has been realized that many ncRNAs play important roles in various biological processes. As RNAs can interact with other RNAs and DNAs in a sequence-specific manner, they are especially useful in tasks that require highly specific nucleotide recognition [11]. Good examples are the miRNAs (microRNAs) that regulate gene expression by targeting mRNAs (messenger RNAs) [4], [20], and the siRNAs (small interfering RNAs) that take part in the RNAi (RNA interference) pathways for gene silencing [29], [30]. Recent developments show that ncRNAs are extensively involved in many gene regulatory mechanisms [14], [17]. The roles of ncRNAs known to this day are truly diverse. These include transcription and translation control, chromosome replication, RNA processing and modification, and protein degradation and translocation [40], just to name a few. These days, it is even claimed that ncRNAs dominate the genomic output of the higher organisms such as mammals, and it is being suggested that the greater portion of their genome (which does not encode proteins) is dedicated to the control and regulation of cell development [27]. As more and more evidence piles up, greater attention is paid to ncRNAs, which have been neglected for a long time. Researchers began to realize that the vast majority of the genome that was regarded as “junk,” mainly because it was not well understood, may indeed hold the key for the best kept secrets in life, such as the mechanism of alternative splicing, the control of epigenetic variations and so forth [27]. The complete range and extent of the role of ncRNAs are not so obvious at this point, but it is certain that a comprehensive understanding of cellular processes is not possible without understanding the functions of ncRNAs [47]

    A new procedure to analyze RNA non-branching structures

    Get PDF
    RNA structure prediction and structural motifs analysis are challenging tasks in the investigation of RNA function. We propose a novel procedure to detect structural motifs shared between two RNAs (a reference and a target). In particular, we developed two core modules: (i) nbRSSP_extractor, to assign a unique structure to the reference RNA encoded by a set of non-branching structures; (ii) SSD_finder, to detect structural motifs that the target RNA shares with the reference, by means of a new score function that rewards the relative distance of the target non-branching structures compared to the reference ones. We integrated these algorithms with already existing software to reach a coherent pipeline able to perform the following two main tasks: prediction of RNA structures (integration of RNALfold and nbRSSP_extractor) and search for chains of matches (integration of Structator and SSD_finder)

    Utilization of tmRNA sequences for bacterial identification

    No full text
    In recent years, molecular approaches based on nucleotide sequences of ribosomal RNA (rRNA) have become widely used tools for identification of bacteria [1-4]. The high degree of evolutionary conservation makes 16S and 23S rRNA molecules very suitable for phylogenetic studies above the species level [3-5]. More than 16,000 sequences of 16S rRNA are presently available in public databases [4,6]. The 16S rRNA sequences are commonly used to design fluorescently labeled oligonucleotide probes. Fluorescence in situ hybridization (FISH) with these probes followed by observation with epifluorescence microscopy allows the identification of a specific microorganism in a mixture with other bacteria [2-4]. By shifting probe target sites from conservative to increasingly variable regions of rRNA, it is possible to adjust the probe specificity from kingdom to species level. Nevertheless, 16S rRNA sequences of closely related strains, subspecies, or even of different species are often identical and therefore can not be used as differentiating markers [3]. Another restriction concerns the accessibility of target sites to the probe in FISH experiments. The presence of secondary structures, or protection of rRNA segments by ribosomal proteins in fixed cells can limit the choice of variable regions as in situ targets for oligonucleotide probes [7,8]. One way to overcome the limitations of in situ identification of bacteria is to use molecules other than rRNA for phylogenetic identification of bacteria, for which nucleotide sequences would be sufficiently divergent to design species specific probes, and which would be more accessible to oligonucleotide probes. For this purpose we investigated the possibility of using tmRNA (also known as 10Sa RNA; [9-11]). This molecule was discovered in E. coli and described as small stable RNA, present at ~1,000 copies per cell [9,11]. The high copy number is an important prerequisite for FISH, which works best with naturally amplified target molecules. In E. coli, tmRNA is encoded by the ssrA gene, is 363 nucleotides long and has properties of tRNA and mRNA [12,13]. tmRNA was shown to be involved in the degradation of truncated proteins: the tmRNA associates with ribosomes stalled on mRNAs lacking stop codons, finally resulting in the addition of a C-terminal peptide tag to the truncated protein. The peptide tag directs the abnormal protein to proteolysis [14,15]. 165 tmRNA sequences have so far (August 2001; The tmRNA Website: http://www.indiana.edu/~tmrna/) been determined [16,17]. The tmRNA is likely to be present in all bacteria and has also been found in algae chloroplasts, the cyanelle of Cyanophora paradoxa and the mitochondrion of the flagellate Reclinomonas americana[10,17,18]

    Molecular biology techniques as a tool for detection and characterisation of Mycobacterium avium subsp. paratuberculosis

    Get PDF
    Mycobacterium avium subsp. paratuberculosis (M. paratuberculosis) is the causative agent of paratuberculosis, also known as Johne’s disease, a chronic intestinal infection in cattle and other ruminants. Paratuberculosis is characterised by diarrhea and weight loss that occurs after a period of a few months up to several years without any clinical signs. The considerable economic losses to dairy and beef cattle producers are caused by reduced milk production and poor reproduction performance in subclinically infected animals. Early diagnosis of infected cattle is essential to prevent the spread of the disease. Efforts have been made to eradicate paratuberculosis by using a detection and cull strategy, but eradication is hampered by the lack of suitable and sensitive diagnostic methods. This thesis, based on five scientific investigations, describes the development of different DNA amplification strategies for detection and characterisation of M. paratuberculosis. Various ways to pre-treat bacterial cultures, tissue specimens and fecal samples prior to PCR analysis were investigated. Internal positive PCR control molecules were developed and used in PCR analyses to improve the reliability and to facilitate the interpretation of the results. The sensitivity of the ultimate methods was found to be approximate that of culture and allowed detection of low numbers of M. paratuberculosis expected to be found in subclinically infected animals. Genomic DNA of a Swedish mycobacterial isolate, incorrectly identified by PCR as M. paratuberculosis was characterised. The isolate was closely related to M. cookii and harboured one copy of a DNA segment with 94% similarity to IS900, the target sequence used in diagnostic PCR for detection of M. paratuberculosis. This finding highlighted the urgency of developing or evaluating PCR systems based on genes other than IS900. A PCR-based fingerprinting method using primers targeting the enterobacterial intergenic consensus sequence (ERIC) and the IS900 sequence was developed and successfully used to distinguish M. paratuberculosis from closely related mycobacteria, including the above mentioned mycobacterial isolate. In conclusion, the molecular biology techniques developed in these studies have proved useful for accelerating the diagnostic detection and characterisation of M. paratuberculosis

    Inferring stabilizing mutations from protein phylogenies : application to influenza hemagglutinin

    Get PDF
    One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution

    Protein Repeats from First Principles

    Get PDF
    Some natural proteins display recurrent structural patterns. Despite being highly similar at the tertiary structure level, repeating patterns within a single repeat protein can be extremely variable at the sequence level. We use a mathematical definition of a repetition and investigate the occurrences of these in sequences of different protein families. We found that long stretches of perfect repetitions are infrequent in individual natural proteins, even for those which are known to fold into structures of recurrent structural motifs. We found that natural repeat proteins are indeed repetitive in their families, exhibiting abundant stretches of 6 amino acids or longer that are perfect repetitions in the reference family. We provide a systematic quantification for this repetitiveness. We show that this form of repetitiveness is not exclusive of repeat proteins, but also occurs in globular domains. A by-product of this work is a fast quantification of the likelihood of a protein to belong to a family.Fil: Turjanski, Pablo Guillermo. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Parra, Rodrigo Gonzalo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Espada, Rocío. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Becher, Veronica Andrea. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Ferreiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentin
    corecore