Search CORE

1,233 research outputs found

Assessing the effects of data selection and representation on the development of reliable E. coli sigma 70 promoter region predictors

Author: Abbas Mostafa M.
El-Manzalawy Yasser
Mohie-Eldin Mostafa M.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

As the number of sequenced bacterial genomes increases, the need for rapid and reliable tools for the annotation of functional elements (e.g., transcriptional regulatory elements) becomes more desirable. Promoters are the key regulatory elements, which recruit the transcriptional machinery through binding to a variety of regulatory proteins (known as sigma factors). The identification of the promoter regions is very challenging because these regions do not adhere to specific sequence patterns or motifs and are difficult to determine experimentally. Machine learning represents a promising and cost-effective approach for computational identification of prokaryotic promoter regions. However, the quality of the predictors depends on several factors including: i) training data; ii) data representation; iii) classification algorithms; iv) evaluation procedures. In this work, we create several variants of E. coli promoter data sets and utilize them to experimentally examine the effect of these factors on the predictive performance of E. coli σ70 promoter models. Our results suggest that under some combinations of the first three criteria, a prediction model might perform very well on cross-validation experiments while its performance on independent test data is drastically very poor. This emphasizes the importance of evaluating promoter region predictors using independent test data, which corrects for the over-optimistic performance that might be estimated using the cross-validation procedure. Our analysis of the tested models shows that good prediction models often perform well despite how the non-promoter data was obtained. On the other hand, poor prediction models seems to be more sensitive to the choice of non-promoter sequences. Interestingly, the best performing sequence-based classifiers outperform the best performing structure-based classifiers on both cross-validation and independent test performance evaluation experiments. Finally, we propose a meta-predictor method combining two top performing sequence-based and structure-based classifiers and compare its performance with some of the state-of-the-art E. coli σ70 promoter prediction methods.NPRP grant No. 4-1454-1-233 from the Qatar National Research Fund (a member of Qatar Foundation).Scopu

Qatar University Institutional Repository

Directory of Open Access Journals

PubMed Central

FigShare

On the spontaneous stochastic dynamics of a single gene: complexity of the molecular interplay at the promoter

Author: A Agresti
A Becskei
A Belle
A Benecke
A Martinez Arias
A Nagaich
A Raj
A Raj
A Sigal
A Sánchez
A Warmflash
Antoine Coulon
AS Ribeiro
B Kaufmann
B Li
C Adams
C Kuttler
CD Cox
CD Cox
D Austin
D Browning
D Volfson
DR Rigney
G Ackers
G Hager
G Hornung
G Innocentini
G Li
G Süel
Guillaume Beslon
H Chang
HD Kim
I Dodd
I Golding
I Lestas
J Ansel
J Goutsias
J McNally
J Mellor
J Paulsson
J Paulsson
J Peccoud
J Pedraza
J Raser
J van Zon
J Veening
J Vilar
JA Bernstein
JJ Kupiec
JM Berg
JZ Hearon
L Bintu
L Bintu
L Saiz
L Saiz
L Saiz
L Verdone
M Becker
M Dunlop
M Elowitz
M Samoilov
M Simpson
M Simpson
M Thattai
M Tomschik
MS Ko
N Chabrier-Rivier
N Maheshri
N Mitarai
N van Kampen
Olivier Gandrillon
P Lu
P Paszek
P Swain
P Warren
PS Swain
R Métivier
R Métivier
R Métivier
R Phair
T Degenhardt
T Jenuwein
T Lipniacki
T Misteli
T Neildez-Nguyen
TB Kepler
TC Voss
TI Lee
TS Karpova
V Lemaire
V Shahrezaei
WJ Blake
Y Setty
Y Shang
Y Tao
Y Wang
Y Wang
Z Wang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

International audienceBACKGROUND: Gene promoters can be in various epigenetic states and undergo interactions with many molecules in a highly transient, probabilistic and combinatorial way, resulting in a complex global dynamics as observed experimentally. However, models of stochastic gene expression commonly consider promoter activity as a two-state on/off system. We consider here a model of single-gene stochastic expression that can represent arbitrary prokaryotic or eukaryotic promoters, based on the combinatorial interplay between molecules and epigenetic factors, including energy-dependent remodeling and enzymatic activities. RESULTS: We show that, considering the mere molecular interplay at the promoter, a single-gene can demonstrate an elaborate spontaneous stochastic activity (eg. multi-periodic multi-relaxation dynamics), similar to what is known to occur at the gene-network level. Characterizing this generic model with indicators of dynamic and steady-state properties (including power spectra and distributions), we reveal the potential activity of any promoter and its influence on gene expression. In particular, we can reproduce, based on biologically relevant mechanisms, the strongly periodic patterns of promoter occupancy by transcription factors (TF) and chromatin remodeling as observed experimentally on eukaryotic promoters. Moreover, we link several of its characteristics to properties of the underlying biochemical system. The model can also be used to identify behaviors of interest (eg. stochasticity induced by high TF concentration) on minimal systems and to test their relevance in larger and more realistic systems. We finally show that TF concentrations can regulate many aspects of the stochastic activity with a considerable flexibility and complexity. CONCLUSIONS: This tight promoter-mediated control of stochasticity may constitute a powerful asset for the cell. Remarkably, a strongly periodic activity that demonstrates a complex TF concentration-dependent control is obtained when molecular interactions have typical characteristics observed on eukaryotic promoters (high mobility, functional redundancy, many alternate states/pathways). We also show that this regime results in a direct and indirect energetic cost. Finally, this model can constitute a framework for unifying various experimental approaches. Collectively, our results show that a gene - the basic building block of complex regulatory networks - can itself demonstrate a significantly complex behavior

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

PubMed Central

Hal-Diderot

Recognition of Promoters in DNA Sequences Using Weightily Averaged One-dependence Estimators

Author: Htike Zaw Zaw
Win Shoon Lei
Publication venue: The Authors. Published by Elsevier B.V.
Publication date: 09/11/2013
Field of study

AbstractThe completion of the human genome project in the last decade has generated a strong demand in computational analysis techniques in order to fully exploit the acquired human genome database. The human genome project generated a perplexing mass of genetic data which necessitates automatic genome annotation. There is a growing interest in the process of gene finding and gene recognition from DNA sequences. In genetics, a promoter is a segment of a DNA that marks the starting point of transcription of a particular gene. Therefore, recognizing promoters is a one step towards gene finding in DNA sequences. Promoters also play a fundamental role in many other vital cellular processes. Aberrant promoters can cause a wide range of diseases including cancers. This paper describes a state-of-the-art machine learning based approach called weightily averaged one-dependence estimators to tackle the problem of recognizing promoters in genetic sequences. To lower the computational complexity and to increase the generalization capability of the system, we employ an entropy-based feature extraction approach to select relevant nucleotides that are directly responsible for promoter recognition. We carried out experiments on a dataset extracted from the biological literature for a proof-of-concept. The proposed system has achieved an accuracy of 97.17% in classifying promoters. The experimental results demonstrate the efficacy of our framework and encourage us to extend the framework to recognize promoter sequences in various species of higher eukaryotes

Elsevier - Publisher Connector

The International Islamic University Malaysia Repository

Orthopoxvirus Genome Evolution: The Role of Gene Loss

Author: Hatcher Eneida L.
Hendrickson Robert Curtis
Lefkowitz Elliot J.
Wang Chunlin
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/09/2010
Field of study

Poxviruses are highly successful pathogens, known to infect a variety of hosts. The family Poxviridae includes Variola virus, the causative agent of smallpox, which has been eradicated as a public health threat but could potentially reemerge as a bioterrorist threat. The risk scenario includes other animal poxviruses and genetically engineered manipulations of poxviruses. Studies of orthologous gene sets have established the evolutionary relationships of members within the Poxviridae family. It is not clear, however, how variations between family members arose in the past, an important issue in understanding how these viruses may vary and possibly produce future threats. Using a newly developed poxvirus-specific tool, we predicted accurate gene sets for viruses with completely sequenced genomes in the genus Orthopoxvirus. Employing sensitive sequence comparison techniques together with comparison of syntenic gene maps, we established the relationships between all viral gene sets. These techniques allowed us to unambiguously identify the gene loss/gain events that have occurred over the course of orthopoxvirus evolution. It is clear that for all existing Orthopoxvirus species, no individual species has acquired protein-coding genes unique to that species. All existing species contain genes that are all present in members of the species Cowpox virus and that cowpox virus strains contain every gene present in any other orthopoxvirus strain. These results support a theory of reductive evolution in which the reduction in size of the core gene set of a putative ancestral virus played a critical role in speciation and confining any newly emerging virus species to a particular environmental (host or tissue) niche

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

PubMed Central

Towards combinatorial transcriptional engineering

Author: Kanodia Harsh
Loake Gary
Mehrotra Rajesh
Mehrotra Sandhya
Renganaath Kaushik
Publication venue: 'Elsevier BV'
Publication date: 01/05/2017
Field of study

Edinburgh Research Explorer

Two novel PIWI families: roles in inter-genomic conflicts in bacteria and Mediator-dependent modulation of transcription in eukaryotes

Author: A Aguilera
A Boland
A Boland
A Maxwell Burroughs
AA Bourniquel
AC Seila
AM Burroughs
AM McRobbie
AS Konagurthu
B Ren
C Cogoni
C Cole
C Matranga
CJ Hengartner
CO Samuelsen
CS Gillmor
DL Chalker
DN Cox
E Valen
EP Murchison
F Frank
F Lai
FM Cernilogar
H Boubakri
H Elmlund
H Tabara
HM Bourbon
I Carrera
J Brennecke
J Rudolf
JB Ma
JC Andrau
JJ Song
JS Parker
JS Parker
JS Parker
JS Yang
K Miyoshi
K Mochizuki
KN Kreuzer
KS Makarova
KS Makarova
L Aravind
L Aravind
L Aravind
L Aravind
L Aravind
L Aravind
L Cerutti
L Holm
Lakshminarayan M Iyer
LS Gunawardane
LS Johnson
M Ameyar-Zazoua
M Halic
M Huynen
M Remmert
ME Fairman-Williams
MR Singleton
MT Knuesel
N Ding
NE Murray
NK Raghavendra
ON Voloshin
PB Kwak
PJ Leuschner
R Aliyari
RA Pugh
RC Conaway
RC Conaway
RC Edgar
RD Finn
RJ Taft
RJ Taft
RJ Taft
S Akoulitchev
S Djuranovic
S Kuchin
S Malik
SA Muljo
SF Altschul
SI Grewal
T Itoh
T Kogoma
T Lassmann
TA Bickle
TA Rand
V Gobert
X Zhu
Y Wang
Y Wang
Y Wang
YR Yuan
Z Bukowy
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

SelenoDB 1.0 : a database of selenoprotein genes, proteins and SECIS elements

Author: Allmang
Altschul
Ashurst
Axley
Baranov
Benson
Berry
Berry
Birney
Castellano
Castellano
Castellano
Eddy
Gromer
Gromer
Hatfield
Hatfield
Hubbard
Kim
Krol
Kryukov
Kryukov
Kryukov
Lee
Lescure
Marla J. Berry
Martin-Romero
Parra
Pruitt
Roderic Guigó
Sergi Castellano
Slater
Taskov
The UniProt Consortium
Vadim N. Gladyshev
Wheelan
Zhang
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Selenoproteins are a diverse group of proteins usually misidentified and misannotated in sequence databases. The presence of an in-frame UGA (stop) codon in the coding sequence of selenoprotein genes precludes their identification and correct annotation. The in-frame UGA codons are recoded to cotranslationally incorporate selenocysteine, a rare selenium-containing amino acid. The development of ad hoc experimental and, more recently, computational approaches have allowed the efficient identification and characterization of the selenoproteomes of a growing number of species. Today, dozens of selenoprotein families have been described and more are being discovered in recently sequenced species, but the correct genomic annotation is not available for the majority of these genes. SelenoDB is a long-term project that aims to provide, through the collaborative effort of experimental and computational researchers, automatic and manually curated annotations of selenoprotein genes, proteins and SECIS elements. Version 1.0 of the database includes an initial set of eukaryotic genomic annotations, with special emphasis on the human selenoproteome, for immediate inspection by selenium researchers or incorporation into more general databases. SelenoDB is freely available at http://www.selenodb.org

Crossref

DigitalCommons@University of Nebraska

PubMed Central

UPF Digital Repository

MPG.PuRe

SelenoDB 1.0 : A Database of Selenoprotein Genes, Proteins and SECIS Elements

Author: Berry Marla J.
Castellano Sergi
Gladyshev Vadim N.
Guigo Roderic
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2008
Field of study