Search CORE

Public Library of Science (PLOS)

RBF-TSS: Identification of Transcription Start Site in Human Using Radial Basis Functions Network and Oligonucleotide Positional Frequencies

Author: C Igel
D Karolchik
Darren P. Martin
DL Wheeler
DS Broomhead
DS Prestridge
Eric C. Rouchka
F Schwenker
J Davis
JE Moody
JR Goñi
MJL Orr
OV Kel-Margoulis
R Yamashita
Rami N. Mahdi
RN Mahdi
RV Davuluri
S Sonnenburg
T Werner
TA Down
U Ohler
V Narang
VB Bajic
VB Bajic
VB Bajic
WJ Kent
Y Suzuki
Y Suzuki
YV Kondrakhin
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome. Many recently published methods have achieved high identification accuracy of TSS. However, models providing more accurate modeling of promoters and TSS are needed. A novel identification method for identifying transcription start sites that improves the accuracy of TSS recognition for recently published methods is proposed. This method incorporates a metric feature based on oligonucleotide positional frequencies, taking into account the nature of promoters. A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm. Using non-overlapping chunks (windows) of size 50 and 500 on the human genome, the proposed method achieves an area under the Receiver Operator Characteristic curve (auROC) of 94.75% and 95.08% respectively, providing increased performance over existing TSS prediction methods

CiteSeerX

Repository for Publications and Research Data

Genomes of coral dinoflagellate symbionts highlight evolutionary adaptations conducive to a symbiotic lifestyle.

Author: Aranda M
Ashoor H
Bajic VB
Baumgarten S
Bayer T
Bhak J
Bougouffa S
Kim H
LaJeunesse TC
Li Y
Liew YJ
Micklem G
Piel J
Ravasi T
Ryu T
Simakov O
Voolstra CR
Wilson MC
Publication venue: Sci Rep
Publication date: 22/12/2016
Field of study

Despite half a century of research, the biology of dinoflagellates remains enigmatic: they defy many functional and genetic traits attributed to typical eukaryotic cells. Genomic approaches to study dinoflagellates are often stymied due to their large, multi-gigabase genomes. Members of the genus Symbiodinium are photosynthetic endosymbionts of stony corals that provide the foundation of coral reef ecosystems. Their smaller genome sizes provide an opportunity to interrogate evolution and functionality of dinoflagellate genomes and endosymbiosis. We sequenced the genome of the ancestral Symbiodinium microadriaticum and compared it to the genomes of the more derived Symbiodinium minutum and Symbiodinium kawagutii and eukaryote model systems as well as transcriptomes from other dinoflagellates. Comparative analyses of genome and transcriptome protein sets show that all dinoflagellates, not only Symbiodinium, possess significantly more transmembrane transporters involved in the exchange of amino acids, lipids, and glycerol than other eukaryotes. Importantly, we find that only Symbiodinium harbor an extensive transporter repertoire associated with the provisioning of carbon and nitrogen. Analyses of these transporters show species-specific expansions, which provides a genomic basis to explain differential compatibilities to an array of hosts and environments, and highlights the putative importance of gene duplications as an evolutionary mechanism in dinoflagellates and Symbiodinium

OceanRep

Apollo (Cambridge)

Computational analyses of eukaryotic promoters

Author: AD Smith
AD Smith
AD Smith
AD Smith
BA Lewis
BC Foat
C Bock
CD Schmid
CE Lawrence
CE Lawrence
CT Workman
D Das
D Das
DJ Huebert
E Segal
EM Conlon
G Cavalli
GC Yuan
GZ Hertz
HJ Bussemaker
J Friedman
M Tompa
MC Thomas
Michael Q Zhang
MJ Buck
MJ Martinez
MQ Zhang
N Maeda
ND Heintzman
NI Gershenzon
P Carninci
P Gross
P Hong
P Sumazin
PJ Sabo
R Das
RA Rollins
RV Davuluri
S Keles
S Keles
S Sinha
S Sonnenburg
SR Schulze
T Hastie
TA Down
TH Kim
TH Kim
TL Bailey
U Ohler
V Matys
VB Bajic
VB Bajic
VB Bajic
VX Jin
WW Wasserman
X Zhao
Y Suzuki
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Computational analysis of eukaryotic promoters is one of the most difficult problems in computational genomics and is essential for understanding gene expression profiles and reverse-engineering gene regulation network circuits. Here I give a basic introduction of the problem and recent update on both experimental and computational approaches. More details may be found in the extended references. This review is based on a summer lecture given at Max Planck Institute at Berlin in 2005

Cold Spring Harbor Laboratory Institutional Repository

E2F5 status significantly improves malignancy diagnosis of epithelial ovarian cancer

Abstract Background Ovarian epithelial cancer (OEC) usually presents in the later stages of the disease. Factors, especially those associated with cell-cycle genes, affecting the genesis and tumour progression for ovarian cancer are largely unknown. We hypothesized that over-expressed transcription factors (TFs), as well as those that are driving the expression of the OEC over-expressed genes, could be the key for OEC genesis and potentially useful tissue and serum markers for malignancy associated with OEC. Methods Using a combination of computational (selection of candidate TF markers and malignancy prediction) and experimental approaches (tissue microarray and western blotting on patient samples) we identified and evaluated E2F5 transcription factor involved in cell proliferation, as a promising candidate regulatory target in early stage disease. Our hypothesis was supported by our tissue array experiments that showed E2F5 expression only in OEC samples but not in normal and benign tissues, and by significantly positively biased expression in serum samples done using western blotting studies. Results Analysis of clinical cases shows that of the E2F5 status is characteristic for a different population group than one covered by CA125, a conventional OEC biomarker. E2F5 used in different combinations with CA125 for distinguishing malignant cyst from benign cyst shows that the presence of CA125 or E2F5 increases sensitivity of OEC detection to 97.9% (an increase from 87.5% if only CA125 is used) and, more importantly, the presence of both CA125 and E2F5 increases specificity of OEC to 72.5% (an increase from 55% if only CA125 is used). This significantly improved accuracy suggests possibility of an improved diagnostics of OEC. Furthermore, detection of malignancy status in 86 cases (38 benign, 48 early and late OEC) shows that the use of E2F5 status in combination with other clinical characteristics allows for an improved detection of malignant cases with sensitivity, specificity, F-measure and accuracy of 97.92%, 97.37%, 97.92% and 97.67%, respectively. Conclusions Overall, our findings, in addition to opening a realistic possibility for improved OEC diagnosis, provide an indirect evidence that a cell-cycle regulatory protein E2F5 might play a significant role in OEC pathogenesis.</p

ScholarBank@NUS

Semantic prioritization of novel causative genomic variants

Author: Bajic VB
Boudellioua I
Gkoutos GV
Goncalves-Serra E
Hashish Y
Hoehndorf R
Kulmanov M
Mahamad Razali RB
Schoenmakers N
Schofield PN
Publication venue: PLoS Computational Biology
Publication date: 01/04/2017
Field of study

Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.NS was funded by Wellcome Trust (Grant 100585/Z/12/Z) and the National Institute for Health Research Cambridge Biomedical Research Centre. IB, RBMR, MK, YH, VBB, RH were funded by the King Abdullah University of Science and Technology. GVG acknowledges funding from the National Science Foundation (NSF grant number: IOS-1340112) and the European Commision H2020 (Grant Agreement No. 731075)

University of Birmingham Research Portal

Apollo (Cambridge)

FigShare

ContDist: a tool for the analysis of quantitative gene and promoter properties

Author: A Siepel
A Subramanian
AE Vinogradov
AI Su
B Efron
C Anselmi
C Bock
F Antequera
F Eckhardt
F Wright
G Dennis Jr
Gorka Lasso
H Jeong
I Rivals
J Zhu
JR Goni
LD Hurst
LNvan de Lagemaat
M Ashburner
M Hackenberg
M Hackenberg
M Weber
Michael Hackenberg
MJ Lercher
MM Suzuki
P Khatri
P Khatri
R Das
R Drysdale
RD Kornberg
Rune Matthiesen
V Curwen
V Miele
VB Bajic
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The understanding of how promoter regions regulate gene expression is complicated and far from being fully understood. It is known that histones' regulation of DNA compactness, DNA methylation, transcription factor binding sites and CpG islands play a role in the transcriptional regulation of a gene. Many high-throughput techniques exist nowadays which permit the detection of epigenetic marks and regulatory elements in the promoter regions of thousands of genes. However, so far the subsequent analysis of such experiments (e.g. the resulting gene lists) have been hampered by the fact that currently no tool exists for a detailed analysis of the promoter regions. Results We present ContDist, a tool to statistically analyze quantitative gene and promoter properties. The software includes approximately 200 quantitative features of gene and promoter regions for 7 commonly studied species. In contrast to "traditionally" ontological analysis which only works on qualitative data, all the features in the underlying annotation database are quantitative gene and promoter properties. Utilizing the strong focus on the promoter region of this tool, we show its usefulness in two case studies; the first on differentially methylated promoters and the second on the fundamental differences between housekeeping and tissue specific genes. The two case studies allow both the confirmation of recent findings as well as revealing previously unreported biological relations. Conclusion ContDist is a new tool with two important properties: 1) it has a strong focus on the promoter region which is usually disregarded by virtually all ontology tools and 2) it uses quantitative (continuously distributed) features of the genes and its promoter regions which are not available in any other tool. ContDist is available from <url>http://web.bioinformatics.cicbiogune.es/CD/ContDistribution.php</url></p

Public Library of Science (PLOS)

Estrogen-Dependent Gene Expression in the Mouse Ovary

Author: A Hellani
AH Charpentier
Ann E. Drummond
BJ Deroo
C Wang
CR Fisher
DB Constam
FJ Diaz
H Zhang
J Brennan
JF Couse
JH Krege
JM Drake
JM Hall
Jock K. Findlay
K Morita
KL Britt
KL Britt
KL Britt
KL Britt
KL Britt
M Brannstrom
M Hsieh
M Hsieh
M Hsieh
M Schena
M Wijgerde
Mai A. Sarraj
MC Russell
ME Jones
ME Jones
MM Matzuk
NC Zachos
R Sekido
RL Robker
S Nilsson
S Tang
S Vainio
Seng H. Liew
SH Liew
TC Wu
Toshi Shioda
V Matys
V Praz
VB Bajic
Y Chen
Y Chen
Y Ren
Z Liu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Estrogen (E) plays a pivotal role in regulating the female reproductive system, particularly the ovary. However, the number and type of ovarian genes influenced by estrogen remain to be fully elucidated. In this study, we have utilized wild-type (WT) and aromatase knockout (ArKO; estrogen free) mouse ovaries as an in vivo model to profile estrogen dependent genes. RNA from each individual ovary (n = 3) was analyzed by a microarray-based screen using Illumina Sentrix Mouse WG-6 BeadChip (45,281 transcripts). Comparative analysis (GeneSpring) showed differential expression profiles of 450 genes influenced by E, with 291 genes up-regulated and 159 down-regulated by 2-fold or greater in the ArKO ovary compared to WT. Genes previously reported to be E regulated in ArKO ovaries were confirmed, in addition to novel genes not previously reported to be expressed or regulated by E in the ovary. Of genes involved in 5 diverse functional processes (hormonal processes, reproduction, sex differentiation and determination, apoptosis and cellular processes) 78 had estrogen-responsive elements (ERE). These analyses define the transcriptome regulated by E in the mouse ovary. Further analysis and investigation will increase our knowledge pertaining to how E influences follicular development and other ovarian functions

Gene prediction in metagenomic fragments: A large scale machine learning approach

Author: A Lukashin
AL Delcher
BE Suzek
Burkhard Morgenstern
CJ van Rijsbergen
CM Bishop
CS Riesenfeld
D Frishman
DA Benson
DJC MacKay
F Sanger
F Wilcoxon
GW Tyson
H Noguchi
HY Ou
IT Nabney
J Besemer
J Handelsman
JC Venter
K Chen
Katharina J Hoff
KE Rudd
L Krause
M Ronaghi
M Tech
M Tech
Maike Tech
MS Rappe
P Hugenholtz
P Nielson
Peter Meinicke
R Amann
R Daniel
R Daniel
R Development Core Team
RA Edwards
Rolf Daniel
S Altschul
S Voget
SG Tringe
T Hastie
T Jarvie
Thomas Lingner
V Torsvik
VB Bajic
W Streit
Publication venue: BioMed Central
Publication date: 01/04/2008
Field of study

Abstract Background Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion Large scale machine learning methods are well-suited for gene prediction in metagenomic DNA fragments. In particular, the combination of linear discriminants and neural networks is promising and should be considered for integration into metagenomic analysis pipelines. The data sets can be downloaded from the URL provided (see Availability and requirements section).</p

Repository for Publications and Research Data

Chicken genome analysis reveals novel genes encoding biotin-binding proteins related to avidin family

Author: A Pagano
A Sali
AC Camproux
AE Kel
D Sanchez
DR Flower
DR Flower
GJ Barton
H Nielsen
HB White III
HB White III
HM Berman
HR Nordlund
HR Nordlund
HW Meslar
JV Lehtonen
L Bush
L Bush
LW Hillier
M Wilchek
MJ Wallén
MK Ahlroth
MK Ahlroth
MK Ahlroth
MK Ahlroth
ML Gope
MS Johnson
MS Johnson
N Subramanian
NM Green
O Livnah
O Livnah
OH Laitinen
OH Laitinen
OH Laitinen
OH Laitinen
OH Laitinen
P Tuohimaa
PB Seshagiri
PC Weber
PE Boardman
RA Keinänen
S Freitag
S Kumar
S Kumar
SC Gill
SW Cowan
T Sano
VB Bajic
VP Hytönen
WE Stumph
WL DeLano
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: A chicken egg contains several biotin-binding proteins (BBPs), whose complete DNA and amino acid sequences are not known. In order to identify and characterise these genes and proteins we studied chicken cDNAs and genes available in the NCBI database and chicken genome database using the reported N-terminal amino acid sequences of chicken egg-yolk BBPs as search strings. RESULTS: Two separate hits showing significant homology for these N-terminal sequences were discovered. For one of these hits, the chromosomal location in the immediate proximity of the avidin gene family was found. Both of these hits encode proteins having high sequence similarity with avidin suggesting that chicken BBPs are paralogous to avidin family. In particular, almost all residues corresponding to biotin binding in avidin are conserved in these putative BBP proteins. One of the found DNA sequences, however, seems to encode a carboxy-terminal extension not present in avidin. CONCLUSION: We describe here the predicted properties of the putative BBP genes and proteins. Our present observations link BBP genes together with avidin gene family and shed more light on the genetic arrangement and variability of this family. In addition, comparative modelling revealed the potential structural elements important for the functional and structural properties of the putative BBP proteins

Jyväskylä University Digital Archive