Search CORE

41,775 research outputs found

Regulatory motif discovery using a population clustering evolutionary algorithm

Author: Lones Michael A.
Tyrrell Andy M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2007
Field of study

This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences

White Rose Research Online

Computational identification and analysis of noncoding RNAs - Unearthing the buried treasures in the genome

Author: Vaidyanathan P. P.
Yoon Byung-Jun
Publication venue
Publication date: 01/01/2007
Field of study

The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein. This dogma has exerted a substantial influence on our understanding of the genetic activities in the cells. Under this influence, the prevailing assumption until the recent past was that genes are basically repositories for protein coding information, and proteins are responsible for most of the important biological functions in all cells. In the meanwhile, the importance of RNAs has remained rather obscure, and RNA was mainly viewed as a passive intermediary that bridges the gap between DNA and protein. Except for classic examples such as tRNAs (transfer RNAs) and rRNAs (ribosomal RNAs), functional noncoding RNAs were considered to be rare. However, this view has experienced a dramatic change during the last decade, as systematic screening of various genomes identified myriads of noncoding RNAs (ncRNAs), which are RNA molecules that function without being translated into proteins [11], [40]. It has been realized that many ncRNAs play important roles in various biological processes. As RNAs can interact with other RNAs and DNAs in a sequence-specific manner, they are especially useful in tasks that require highly specific nucleotide recognition [11]. Good examples are the miRNAs (microRNAs) that regulate gene expression by targeting mRNAs (messenger RNAs) [4], [20], and the siRNAs (small interfering RNAs) that take part in the RNAi (RNA interference) pathways for gene silencing [29], [30]. Recent developments show that ncRNAs are extensively involved in many gene regulatory mechanisms [14], [17]. The roles of ncRNAs known to this day are truly diverse. These include transcription and translation control, chromosome replication, RNA processing and modification, and protein degradation and translocation [40], just to name a few. These days, it is even claimed that ncRNAs dominate the genomic output of the higher organisms such as mammals, and it is being suggested that the greater portion of their genome (which does not encode proteins) is dedicated to the control and regulation of cell development [27]. As more and more evidence piles up, greater attention is paid to ncRNAs, which have been neglected for a long time. Researchers began to realize that the vast majority of the genome that was regarded as “junk,” mainly because it was not well understood, may indeed hold the key for the best kept secrets in life, such as the mechanism of alternative splicing, the control of epigenetic variations and so forth [27]. The complete range and extent of the role of ncRNAs are not so obvious at this point, but it is certain that a comprehensive understanding of cellular processes is not possible without understanding the functions of ncRNAs [47]

Caltech Authors

A new procedure to analyze RNA non-branching structures

Author: FISCON GIULIA
G. Iannello
P. Paci
T. Colombo
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2015
Field of study

RNA structure prediction and structural motifs analysis are challenging tasks in the investigation of RNA function. We propose a novel procedure to detect structural motifs shared between two RNAs (a reference and a target). In particular, we developed two core modules: (i) nbRSSP_extractor, to assign a unique structure to the reference RNA encoded by a set of non-branching structures; (ii) SSD_finder, to detect structural motifs that the target RNA shares with the reference, by means of a new score function that rewards the relative distance of the target non-branching structures compared to the reference ones. We integrated these algorithms with already existing software to reach a coherent pipeline able to perform the following two main tasks: prediction of RNA structures (integration of RNALfold and nbRSSP_extractor) and search for chains of matches (integration of Structator and SSD_finder)

Archivio della ricerca- Università di Roma La Sapienza

Utilization of tmRNA sequences for bacterial identification

Author: Amann R.
Kulakauskas S.
Le Bourhis G.
Schönhuber W.
Tremblay J.
Publication venue
Publication date: 07/09/2001
Field of study

In recent years, molecular approaches based on nucleotide sequences of ribosomal RNA (rRNA) have become widely used tools for identification of bacteria [1-4]. The high degree of evolutionary conservation makes 16S and 23S rRNA molecules very suitable for phylogenetic studies above the species level [3-5]. More than 16,000 sequences of 16S rRNA are presently available in public databases [4,6]. The 16S rRNA sequences are commonly used to design fluorescently labeled oligonucleotide probes. Fluorescence in situ hybridization (FISH) with these probes followed by observation with epifluorescence microscopy allows the identification of a specific microorganism in a mixture with other bacteria [2-4]. By shifting probe target sites from conservative to increasingly variable regions of rRNA, it is possible to adjust the probe specificity from kingdom to species level. Nevertheless, 16S rRNA sequences of closely related strains, subspecies, or even of different species are often identical and therefore can not be used as differentiating markers [3]. Another restriction concerns the accessibility of target sites to the probe in FISH experiments. The presence of secondary structures, or protection of rRNA segments by ribosomal proteins in fixed cells can limit the choice of variable regions as in situ targets for oligonucleotide probes [7,8]. One way to overcome the limitations of in situ identification of bacteria is to use molecules other than rRNA for phylogenetic identification of bacteria, for which nucleotide sequences would be sufficiently divergent to design species specific probes, and which would be more accessible to oligonucleotide probes. For this purpose we investigated the possibility of using tmRNA (also known as 10Sa RNA; [9-11]). This molecule was discovered in E. coli and described as small stable RNA, present at ~1,000 copies per cell [9,11]. The high copy number is an important prerequisite for FISH, which works best with naturally amplified target molecules. In E. coli, tmRNA is encoded by the ssrA gene, is 363 nucleotides long and has properties of tRNA and mRNA [12,13]. tmRNA was shown to be involved in the degradation of truncated proteins: the tmRNA associates with ribosomes stalled on mRNAs lacking stop codons, finally resulting in the addition of a C-terminal peptide tag to the truncated protein. The peptide tag directs the abnormal protein to proteolysis [14,15]. 165 tmRNA sequences have so far (August 2001; The tmRNA Website: http://www.indiana.edu/~tmrna/) been determined [16,17]. The tmRNA is likely to be present in all bacteria and has also been found in algae chloroplasts, the cyanelle of Cyanophora paradoxa and the mitochondrion of the flagellate Reclinomonas americana[10,17,18]

MPG.PuRe

Molecular biology techniques as a tool for detection and characterisation of Mycobacterium avium subsp. paratuberculosis

Author: Englund Stina
Publication venue
Publication date: 01/05/2002
Field of study

Mycobacterium avium subsp. paratuberculosis (M. paratuberculosis) is the causative agent of paratuberculosis, also known as Johne’s disease, a chronic intestinal infection in cattle and other ruminants. Paratuberculosis is characterised by diarrhea and weight loss that occurs after a period of a few months up to several years without any clinical signs. The considerable economic losses to dairy and beef cattle producers are caused by reduced milk production and poor reproduction performance in subclinically infected animals. Early diagnosis of infected cattle is essential to prevent the spread of the disease. Efforts have been made to eradicate paratuberculosis by using a detection and cull strategy, but eradication is hampered by the lack of suitable and sensitive diagnostic methods. This thesis, based on five scientific investigations, describes the development of different DNA amplification strategies for detection and characterisation of M. paratuberculosis. Various ways to pre-treat bacterial cultures, tissue specimens and fecal samples prior to PCR analysis were investigated. Internal positive PCR control molecules were developed and used in PCR analyses to improve the reliability and to facilitate the interpretation of the results. The sensitivity of the ultimate methods was found to be approximate that of culture and allowed detection of low numbers of M. paratuberculosis expected to be found in subclinically infected animals. Genomic DNA of a Swedish mycobacterial isolate, incorrectly identified by PCR as M. paratuberculosis was characterised. The isolate was closely related to M. cookii and harboured one copy of a DNA segment with 94% similarity to IS900, the target sequence used in diagnostic PCR for detection of M. paratuberculosis. This finding highlighted the urgency of developing or evaluating PCR systems based on genes other than IS900. A PCR-based fingerprinting method using primers targeting the enterobacterial intergenic consensus sequence (ERIC) and the IS900 sequence was developed and successfully used to distinguish M. paratuberculosis from closely related mycobacteria, including the above mentioned mycobacterial isolate. In conclusion, the molecular biology techniques developed in these studies have proved useful for accelerating the diagnostic detection and characterisation of M. paratuberculosis

Epsilon Open Archive

Inferring stabilizing mutations from protein phylogenies : application to influenza hemagglutinin

Author: A Akasako
A Akasako
A Cao
A Martin
A Mitraki
A Rambaut
AA Pakula
AR Dinner
AR Fersht
AR Fersht
AS Yang
AS Yang
AV Gribenko
B Steipe
B Steipe
BM Broome
C Pal
C Park
CB Anfinsen
CB Do
CM Dobson
CT Saunders
D Gilis
D Perl
D Shortle
DA Cowan
DA Drummond
DA Drummond
DD Loeb
DM Taverna
DM Taverna
E Capriotti
E Hoffmann
E van Nimwegen
EPC Rocha
Eugene I. Shakhnovich
F Chiti
F Ronquist
G Parisi
GG Brownlee
H Akashi
H Li
H Schindelin
H Zhao
H Zhou
HW Hellinga
I Keller
IE Sanchez
IMP del Pino
J Felsenstein
J Felsenstein
J Felsenstein
J Felsenstein
J Kyte
JA Wells
JB Garrett
JD Bloom
JD Bloom
JD Bloom
JD Bloom
Jesse D. Bloom
JL Thorne
JM Koshi
JP Huelsenbeck
JP Huelsenbeck
JR Cochran
JR Lepock
JV Chamary
K Ishikawa
K Ishikawa
K Katayanagi
KA Bava
KA Gray
KB Zeldovich
KJ Szretter
KL Maxwell
L Giver
L Serrano
M Dai
M Haruki
M Jacob
M Lehmann
M Matrosovich
M Ueda
M Wunderlich
Matthew J. Glassman
MD Kumar
MF Sippl
MM Garcia-Mira
MM Gromiha
MP Canadillas
MS Fornasari
MW Pantoliano
N Amin
N Goldman
N Goldman
N Lartillot
N Tong
R Godoy-Ruiz
R Godoy-Ruiz
R Godoy-Ruiz
R Guerois
R Rabadan
R Sakaue
RC Edgar
RJ Ellis
S Govindarajan
S Kimura
S Kimura
S Nakajima
S Sato
SC Choi
SH White
SJ Gamblin
SS Jaswal
U Bastolla
V Parthiban
VG Dugan
VN Uversky
W Besenmatter
WS Sandberg
WSW Wong
XJ Zhang
Y Bao
YY Tseng
Z Chen
Publication venue: International Society for Computational Biology
Publication date: 01/04/2009
Field of study

One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors

Recommended from our members

Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing.

Author: Korf Ian
Segal David J
Zykovich Artem
Publication venue: eScholarship, University of California
Publication date: 01/12/2009
Field of study

Transcription factor-DNA interactions are some of the most important processes in biology because they directly control hereditary information. The targets of most transcription factor are unknown. In this report, we introduce Bind-n-Seq, a new high-throughput method for analyzing protein-DNA interactions in vitro, with several advantages over current methods. The procedure has three steps (i) binding proteins to randomized oligonucleotide DNA targets, (ii) sequencing the bound oligonucleotide with massively parallel technology and (iii) finding motifs among the sequences. De novo binding motifs determined by this method for the DNA-binding domains of two well-characterized zinc-finger proteins were similar to those described previously. Furthermore, calculations of the relative affinity of the proteins for specific DNA sequences correlated significantly with previous studies (R(2 )= 0.9). These results present Bind-n-Seq as a highly rapid and parallel method for determining in vitro binding sites and relative affinities

eScholarship - University of California

Dynamic scaffolds for neuronal signaling: in silico analysis of the TANC protein family

Author: Gasparini Alessandra
Leonardi Emanuela
Murgia Alessandra
Tosatto Silvio C. E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Archivio istituzionale della ricerca - Università di Padova

Protein Repeats from First Principles

Author: Becher Veronica Andrea
Espada Rocío
Ferreiro Diego
Parra Rodrigo Gonzalo
Turjanski Pablo Guillermo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2016
Field of study

Some natural proteins display recurrent structural patterns. Despite being highly similar at the tertiary structure level, repeating patterns within a single repeat protein can be extremely variable at the sequence level. We use a mathematical definition of a repetition and investigate the occurrences of these in sequences of different protein families. We found that long stretches of perfect repetitions are infrequent in individual natural proteins, even for those which are known to fold into structures of recurrent structural motifs. We found that natural repeat proteins are indeed repetitive in their families, exhibiting abundant stretches of 6 amino acids or longer that are perfect repetitions in the reference family. We provide a systematic quantification for this repetitiveness. We show that this form of repetitiveness is not exclusive of repeat proteins, but also occurs in globular domains. A by-product of this work is a fast quantification of the likelihood of a protein to belong to a family.Fil: Turjanski, Pablo Guillermo. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Parra, Rodrigo Gonzalo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Espada, Rocío. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Becher, Veronica Andrea. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Ferreiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentin

CONICET Digital