Search CORE

3,811 research outputs found

Promoter prediction using physico-chemical properties of DNA

Author: A. Gabrielian
A. Kanhere
A.G. Pedersen
A.V. Sivolob
C.H. Choi
H. Wang
I.H. Witten
J. Platt
J.W. Fickett
K. Breslauer
K. Florquin
L. Tsai
M. Hassan el
R.D. Blake
S. Lisser
S.C. Satchwell
S.S. Keerthi
T. Ota
U. Ohler
V.B. Bajic
V.I. Ivanov
Y. Fukue
Y. Fukue
Y. Suzuki
Y. Suzuki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

The ability to locate promoters within a section of DNA is known to be a very difficult and very important task in DNA analysis. We document an approach that incorporates the concept of DNA as a complex molecule using several models of its physico-chemical properties. A support vector machine is trained to recognise promoters by their distinctive physical and chemical properties. We demonstrate that by combining models, we can improve upon the classification accuracy obtained with a single model. We also show that by examining how the predictive accuracy of these properties varies over the promoter, we can reduce the number of attributes needed. Finally, we apply this method to a real-world problem. The results demonstrate that such an approach has significant merit in its own right. Furthermore, they suggest better results from a planned combined approach to promoter prediction using both physicochemical and sequence based techniques

Crossref

University of Tasmania Open Access Repository

Human Promoter Prediction Using DNA Numerical Representation

Author: Arniker Swarna Bai
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2010
Field of study

With the emergence of genomic signal processing, numerical representation techniques for DNA alphabet set {A, G, C, T} play a key role in applying digital signal processing and machine learning techniques for processing and analysis of DNA sequences. The choice of the numerical representation of a DNA sequence affects how well the biological properties can be reflected in the numerical domain for the detection and identification of the characteristics of special regions of interest within the DNA sequence. This dissertation presents a comprehensive study of various DNA numerical and graphical representation methods and their applications in processing and analyzing long DNA sequences. Discussions on the relative merits and demerits of the various methods, experimental results and possible future developments have also been included. Another area of the research focus is on promoter prediction in human (Homo Sapiens) DNA sequences with neural network based multi classifier system using DNA numerical representation methods. In spite of the recent development of several computational methods for human promoter prediction, there is a need for performance improvement. In particular, the high false positive rate of the feature-based approaches decreases the prediction reliability and leads to erroneous results in gene annotation.To improve the prediction accuracy and reliability, DigiPromPred a numerical representation based promoter prediction system is proposed to characterize DNA alphabets in different regions of a DNA sequence.The DigiPromPred system is found to be able to predict promoters with a sensitivity of 90.8% while reducing false prediction rate for non-promoter sequences with a specificity of 90.4%. The comparative study with state-of-the-art promoter prediction systems for human chromosome 22 shows that our proposed system maintains a good balance between prediction accuracy and reliability. To reduce the system architecture and computational complexity compared to the existing system, a simple feed forward neural network classifier known as SDigiPromPred is proposed. The SDigiPromPred system is found to be able to predict promoters with a sensitivity of 87%, 87%, 99% while reducing false prediction rate for non-promoter sequences with a specificity of 92%, 94%, 99% for Human, Drosophila, and Arabidopsis sequences respectively with reconfigurable capability compared to existing system

Scholarship at UWindsor

The Proteomic Code: a molecular recognition code for proteins

Author: A Bhakoo
AA Komar
B Benyo
C Levinthal
CB Anfinsen
CR Woese
CR Woese
D Naor
DA Weigent
DR Forsdyke
E Azarya-Sprinzak
E Neher
F Glaser
FHC Crick
G D'Onofrio
G Gamow
G Gamow
G Gamow
G Gamow
G Gamow
G Gamow
H Fan
H Okada
HM Berman
HM Berman
IA Adzhubei
IZ Siemion
IZ Siemion
IZ Siemion
J Biro
J Biro
J Biro
J Biro
Jan C Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JC Biro
JD Watson
JE Blalock
JE Blalock
JE Blalock
JE Blalock
JE Blalock
JE McGuigan
JE Zull
JE Zull
JG Omichinski
JR Heal
JR Heal
JR Heal
JR Heal
JT Wong
K Ikehara
K Ikehara
K Nord
KC Gokhale
KI Rother
KL Bost
KL Bost
L Baranyi
L Baranyi
L Baranyi
L Katz
L Pauling
L Pauling
L Pauling
LB Mekler
LB Mekler
M Eilers
M Oresic
M Zuker
ML Chiusano
MO Dayhoff
MS Singer
O Ermolaeva
RS Root-Bernstein
RS Root-Bernstein
S Brunak
S Walter
SD Seiwert
SK Gupta
T Junier
T Pawson
T Xie
TA Thanaraj
TS Kumarevel
U Segerstéen
W Gu
W Gu
W Seffens
WL Duax
Y Isogai
Y Shao
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The Proteomic Code is a set of rules by which information in genetic material is transferred into the physico-chemical properties of amino acids. It determines how individual amino acids interact with each other during folding and in specific protein-protein interactions. The Proteomic Code is part of the redundant Genetic Code. Review The 25-year-old history of this concept is reviewed from the first independent suggestions by Biro and Mekler, through the works of Blalock, Root-Bernstein, Siemion, Miller and others, followed by the discovery of a Common Periodic Table of Codons and Nucleic Acids in 2003 and culminating in the recent conceptualization of partial complementary coding of interacting amino acids as well as the theory of the nucleic acid-assisted protein folding. Methods and conclusions A novel cloning method for the design and production of specific, high-affinity-reacting proteins (SHARP) is presented. This method is based on the concept of proteomic codes and is suitable for large-scale, industrial production of specifically interacting peptides.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

FeatureScan: revealing property-dependent similarity of nucleotide sequences

Author: Blöcker Helmut
Bredohl Björn
Deyneko Igor V.
Kalybaeva Yulia M.
Kauer Gerhard
Kel Alexander E.
Wesely Daniel
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

FeatureScan is a software package aiming to reveal novel types of DNA sequence similarity by comparing physico-chemical properties. Thirty-eight different parameters of DNA double strands such as charge, melting enthalpy, conformational parameters and the like are provided. As input FeatureScan requires two sequences, a pattern sequence and a target sequence, search conditions are set by selecting a specific DNA parameter and a threshold value. Search results are displayed in FASTA format and directly linked to external genome databases/browsers (ENSEMBL, NCBI, UCSC). An Internet version of FeatureScan is accessible at . As part of the HOBIT initiative () FeatureScan is also accessible as a web service at its above home page. Currently, several preloaded genomes are provided at this Internet website (Homo sapiens, Mus musculus, Rattus norvegicus and four strains of Escherichia coli) as target sequences. Standalone executables of FeatureScan are available on request

CiteSeerX

Crossref

PubMed Central

Annotation of gene promoters by integrative data-mining of ChIP-seq Pol-II enrichment data

Author: Anirban Bhattacharyya
Francisco A Perez
Priyankara Wikramasinghe
Ramana V Davuluri
Ravi Gupta
Sharmistha Pal
Publication venue: Springer Nature
Publication date: 01/01/2010
Field of study

BACKGROUND: Use of alternative gene promoters that drive widespread cell-type, tissue-type or developmental gene regulation in mammalian genomes is a common phenomenon. Chromatin immunoprecipitation methods coupled with DNA microarray (ChIP-chip) or massive parallel sequencing (ChIP-seq) are enabling genome-wide identification of active promoters in different cellular conditions using antibodies against Pol-II. However, these methods produce enrichment not only near the gene promoters but also inside the genes and other genomic regions due to the non-specificity of the antibodies used in ChIP. Further, the use of these methods is limited by their high cost and strong dependence on cellular type and context. METHODS: We trained and tested different state-of-art ensemble and meta classification methods for identification of Pol-II enriched promoter and Pol-II enriched non-promoter sequences, each of length 500 bp. The classification models were trained and tested on a bench-mark dataset, using a set of 39 different feature variables that are based on chromatin modification signatures and various DNA sequence features. The best performing model was applied on seven published ChIP-seq Pol-II datasets to provide genome wide annotation of mouse gene promoters. RESULTS: We present a novel algorithm based on supervised learning methods to discriminate promoter associated Pol-II enrichment from enrichment elsewhere in the genome in ChIP-chip/seq profiles. We accumulated a dataset of 11,773 promoter and 46,167 non-promoter sequences, each of length 500 bp, generated from RNA Pol-II ChIP-seq data of five tissues (Brain, Kidney, Liver, Lung and Spleen). We evaluated the classification models in building the best predictor and found that Bagging and Random Forest based approaches give the best accuracy. We implemented the algorithm on seven different published ChIP-seq datasets to provide a comprehensive set of promoter annotations for both protein-coding and non-coding genes in the mouse genome. The resulting annotations contain 13,413 (4,747) protein-coding (non-coding) genes with single promoters and 9,929 (1,858) protein-coding (non-coding) genes with two or more alternative promoters, and a significant number of unassigned novel promoters. CONCLUSION: Our new algorithm can successfully predict the promoters from the genome wide profile of Pol-II bound regions. In addition, our algorithm performs significantly better than existing promoter prediction methods and can be applied for genome-wide predictions of Pol-II promoters

Crossref

Springer - Publisher Connector

PubMed Central

Recommended from our members

Using cellular fitness to map the structure and function of a major facilitator superfamily effluxer.

Author: Bennett Matthew R
Gomez Marcella M
Kalvapalle Prashant
O'Brien-Gilbert Erin
Perez Anisha M
Shamoo Yousif
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

The major facilitator superfamily (MFS) effluxers are prominent mediators of antimicrobial resistance. The biochemical characterization of MFS proteins is hindered by their complex membrane environment that makes in vitro biochemical analysis challenging. Since the physicochemical properties of proteins drive the fitness of an organism, we posed the question of whether we could reverse that relationship and derive meaningful biochemical parameters for a single protein simply from fitness changes it confers under varying strengths of selection. Here, we present a physiological model that uses cellular fitness as a proxy to predict the biochemical properties of the MFS tetracycline efflux pump, TetB, and a family of single amino acid variants. We determined two lumped biochemical parameters roughly describing Km and Vmax for TetB and variants. Including in vivo protein levels into our model allowed for more specified prediction of pump parameters relating to substrate binding affinity and pumping efficiency for TetB and variants. We further demonstrated the general utility of our model by solely using fitness to assay a library of tet(B) variants and estimate their biochemical properties

eScholarship - University of California

DSpace at Rice University

A Phenomenological Model for Predicting Melting Temperatures of DNA Sequences

Author: Bhyravabhotla Jayaram
Khandelwal Garima
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

We report here a novel method for predicting melting temperatures of DNA sequences based on a molecular-level hypothesis on the phenomena underlying the thermal denaturation of DNA. The model presented here attempts to quantify the energetic components stabilizing the structure of DNA such as base pairing, stacking, and ionic environment which are partially disrupted during the process of thermal denaturation. The model gives a Pearson product-moment correlation coefficient (r) of ∼0.98 between experimental and predicted melting temperatures for over 300 sequences of varying lengths ranging from 15-mers to genomic level and at different salt concentrations. The approach is implemented as a web tool (www.scfbio-iitd.res.in/chemgenome/Tm_predictor.jsp) for the prediction of melting temperatures of DNA sequences

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Mining microbial genomes for new natural products and biosynthetic pathways

Author: Austin
Austin
Banskota
Banskota
Barona-Gómez
Barona-Gómez
Bentley
Bergmann
Bok
Caffrey
Chain
Challis
Challis
Challis
Challis
Chen
Chen
Corre
de Bruijn
Drake
Erb
Fischbach
Fleischmann
Fleming
Funa
Gregory L. Challis
Gross
Gross
Grüschow
Gust
Günter
Haydock
Haynes
Hojati
Hornung
Ikeda
Izumikawa
Kadi
Keller
Koehn
Larsen
Lautru
Lautru
Li
Li
Lin
Lin
McAlpine
Miethke
Minowa
Miyanaga
Muller
Nguyen
Oliynyk
Omura
Paulsen
Petersen
Pfeifer
Rausch
Reid
Song
Stachelhaus
Sudek
Tohyama
Udwary
Wilkinson
Zhao
Zirkle
Publication venue: 'Microbiology Society'
Publication date: 01/06/2008
Field of study

Analyses of microbial genome sequences have revealed numerous examples of ‘cryptic’ or ‘orphan’ biosynthetic gene clusters, with the potential to direct the production of novel, structurally complex natural products. This article summarizes the various methods that have been developed for discovering the products of cryptic biosynthetic gene clusters in microbes and gives an account of my group's discovery of the products of two such gene clusters in the model actinomycete Streptomyces coelicolor M145. These discoveries hint at new mechanisms, roles and specificities for natural product biosynthetic enzymes. Our efforts to elucidate these are described. The identification of new secondary metabolites of S. coelicolor raises the question: what is their biological function? Progress towards answering this question is also summarized

Crossref

Warwick Research Archives Portal Repository