Search CORE

106 research outputs found

FRAGS: estimation of coding sequence substitution rates from fragmentary data

Author: Hide Winston A
Seoighe Cathal
Swart Estienne C
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. RESULTS: We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. CONCLUSION: We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data

Directory of Open Access Journals

PubMed Central

University of the Western Cape Research Repository

The contribution of exon-skipping events on chromosome 22 to protein coding diversity

Author: Babenko Vladimir N.
Hide Winston A.
van Heusden Peter A.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/01/2001
Field of study

Completion of the human genome sequence provides evidence for a gene count with lower bound 30,000–40,000. Significant protein complexity may derive in part from multiple transcript isoforms. Recent EST based studies have revealed that alternate transcription, including alternative splicing, polyadenylation and transcription start sites, occurs within at least 30–40% of human genes. Transcript form surveys have yet to integrate the genomic context, expression, frequency, and contribution to protein diversity of isoform variation. We determine here the degree to which protein coding diversity may be influenced by alternate expression of transcripts by exhaustive manual confirmation of genome sequence annotation, and comparison to available transcript data to accurately associate skipped exon isoforms with genomic sequence. Relative expression levels of transcripts are estimated from EST database representation. The rigorous in silico method accurately identifies exon skipping using verified genome sequence. 545 genes have been studied in this first hand-curated assessment of exon skipping on chromosome 22

University of the Western Cape Research Repository

Prioritizing genes of potential relevance to diseases affected by sex hormones: an example of Myasthenia Gravis

Author: Bajic Vladimir B
Hide Winston A
Hofmann Oliver
Kaur Mandeep
MacPherson Cameron R
Schmeier Sebastian
Taylor Stephen
Willcox Nick
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background About 5% of western populations are afflicted by autoimmune diseases many of which are affected by sex hormones. Autoimmune diseases are complex and involve many genes. Identifying these disease-associated genes contributes to development of more effective therapies. Also, association studies frequently imply genomic regions that contain disease-associated genes but fall short of pinpointing these genes. The identification of disease-associated genes has always been challenging and to date there is no universal and effective method developed. Results We have developed a method to prioritize disease-associated genes for diseases affected strongly by sex hormones. Our method uses various types of information available for the genes, but no information that directly links genes with the disease. It generates a score for each of the considered genes and ranks genes based on that score. We illustrate our method on early-onset myasthenia gravis (MG) using genes potentially controlled by estrogen and localized in a genomic segment (which contains the MHC and surrounding region) strongly associated with MG. Based on the considered genomic segment 283 genes are ranked for their relevance to MG and responsiveness to estrogen. The top three ranked genes, HLA-G, TAP2 and HLA-DRB1, are implicated in autoimmune diseases, while TAP2 is associated with SNPs characteristic for MG. Within the top 35 prioritized genes our method identifies 90% of the 10 already known MG-associated genes from the considered region without using any information that directly links genes to MG. Among the top eight genes we identified HLA-G and TUBB as new candidates. We show that our <it>ab-initio </it>approach outperforms the other methods for prioritizing disease-associated genes. Conclusion We have developed a method to prioritize disease-associated genes under the potential control of sex hormones. We demonstrate the success of this method by prioritizing the genes localized in the MHC and surrounding region and evaluating the role of these genes as potential candidates for estrogen control as well as MG. We show that our method outperforms the other methods. The method has a potential to be adapted to prioritize genes relevant to other diseases.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

University of Melbourne Institutional Repository

CLU: A new algorithm for EST clustering

Author: A Kalyanaraman
Andrey Ptitsyn
AR Williamson
GG Lennon
J Burke
J Quackenbush
K Malde
M Cariaso
MS Boguski
MS Boguski
RT Miller
T Kapros
VB Streletc
VB Strelets
Winston Hide
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The continuous flow of EST data remains one of the richest sources for discoveries in modern biology. The first step in EST data mining is usually associated with EST clustering, the process of grouping of original fragments according to their annotation, similarity to known genomic DNA or each other. Clustered EST data, accumulated in databases such as UniGene, STACK and TIGR Gene Indices have proven to be crucial in research areas from gene discovery to regulation of gene expression. RESULTS: We have developed a new nucleotide sequence matching algorithm and its implementation for clustering EST sequences. The program is based on the original CLU match detection algorithm, which has improved performance over the widely used d2_cluster. The CLU algorithm automatically ignores low-complexity regions like poly-tracts and short tandem repeats. CONCLUSION: CLU represents a new generation of EST clustering algorithm with improved performance over current approaches. An early implementation can be applied in small and medium-size projects. The CLU program is available on an open source basis free of charge. It can be downloaded fro

Crossref

Springer - Publisher Connector

PubMed Central

White Rose Research Online

Recommended from our members

Integrating Murine Gene Expression Studies to Understand Obstructive Lung Disease due to Chronic Inhaled Endotoxin

Author: Baron Rebecca Marlene
Brass David M.
Bresler Herbert S.
Cernadas Manuela
Christiani David C.
Hide Winston
Hofmann Oliver Marc
Lai Peggy Sue
Meng Quanxin Ryan
Schwartz David A.
Yang Ivana V.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/05/2013
Field of study

Rationale: Endotoxin is a near ubiquitous environmental exposure that that has been associated with both asthma and chronic obstructive pulmonary disease (COPD). These obstructive lung diseases have a complex pathophysiology, making them difficult to study comprehensively in the context of endotoxin. Genome-wide gene expression studies have been used to identify a molecular snapshot of the response to environmental exposures. Identification of differentially expressed genes shared across all published murine models of chronic inhaled endotoxin will provide insight into the biology underlying endotoxin-associated lung disease. Methods: We identified three published murine models with gene expression profiling after repeated low-dose inhaled endotoxin. All array data from these experiments were re-analyzed, annotated consistently, and tested for shared genes found to be differentially expressed. Additional functional comparison was conducted by testing for significant enrichment of differentially expressed genes in known pathways. The importance of this gene signature in smoking-related lung disease was assessed using hierarchical clustering in an independent experiment where mice were exposed to endotoxin, smoke, and endotoxin plus smoke. Results: A 101-gene signature was detected in three murine models, more than expected by chance. The three model systems exhibit additional similarity beyond shared genes when compared at the pathway level, with increasing enrichment of inflammatory pathways associated with longer duration of endotoxin exposure. Genes and pathways important in both asthma and COPD were shared across all endotoxin models. Mice exposed to endotoxin, smoke, and smoke plus endotoxin were accurately classified with the endotoxin gene signature. Conclusions: Despite the differences in laboratory, duration of exposure, and strain of mouse used in three experimental models of chronic inhaled endotoxin, surprising similarities in gene expression were observed. The endotoxin component of tobacco smoke may play an important role in disease development

Harvard University - DASH

University of Melbourne Institutional Repository

The Francis Crick Institute

Comparison of glioma stem cells to neural stem cells from the adult human brain identifies dysregulated Wnt- signaling and a fingerprint associated with clinical outcome

Author: Altschuler Gabriel
Grasmo-Wendler Unn-Hilde
Helseth Eirik
Hide Winston
Jeong Jieun
Langmoen Iver A.
Murrell Wayne
Myklebost Ola
Sandberg Cecilie Jonsgar
Stangeland Biljana
Strømme Kirsten Kierulf
Vik-Mo Einar Osland
Publication venue: The Authors. Published by Elsevier Inc.
Publication date: 15/08/2013
Field of study

AbstractGlioblastoma is the most common brain tumor. Median survival in unselected patients is <10 months. The tumor harbors stem-like cells that self-renew and propagate upon serial transplantation in mice, although the clinical relevance of these cells has not been well documented. We have performed the first genome-wide analysis that directly relates the gene expression profile of nine enriched populations of glioblastoma stem cells (GSCs) to five identically isolated and cultivated populations of stem cells from the normal adult human brain. Although the two cell types share common stem- and lineage-related markers, GSCs show a more heterogeneous gene expression. We identified a number of pathways that are dysregulated in GSCs. A subset of these pathways has previously been identified in leukemic stem cells, suggesting that cancer stem cells of different origin may have common features. Genes upregulated in GSCs were also highly expressed in embryonic and induced pluripotent stem cells. We found that canonical Wnt-signaling plays an important role in GSCs, but not in adult human neural stem cells. As well we identified a 30-gene signature highly overexpressed in GSCs. The expression of these signature genes correlates with clinical outcome and demonstrates the clinical relevance of GSCs

Elsevier - Publisher Connector

Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes

Author: Adeyemo Adebowale
Adie Euan
Andrade-Navarro Miguel A.
Brunner Han G.
Hide Winston
Lopez-Bigas Nuria
Oti Martin
Ouzounis Christos
Patti Mary Elizabeth
Perez-Iratxeta Carolina
Semple Colin A. M.
Tiffin Nicki
Turner Frances
van Driel Marc A.
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Genome-wide experimental methods to identify disease genes, such as linkage analysis and association studies, generate increasingly large candidate gene sets for which comprehensive empirical analysis is impractical. Computational methods employ data from a variety of sources to identify the most likely candidate disease genes from these gene sets. Here, we review seven independent computational disease gene prioritization methods, and then apply them in concert to the analysis of 9556 positional candidate genes for type 2 diabetes (T2D) and the related trait obesity. We generate and analyse a list of nine primary candidate genes for T2D genes and five for obesity. Two genes, LPL and BCKDHA, are common to these two sets. We also present a set of secondary candidates for T2D (94 genes) and for obesity (116 genes) with 58 genes in common to both diseases

Crossref

PubMed Central

Edinburgh Research Explorer

Radboud Repository (Radboud Univ.)

King's Research Portal

White Rose Research Online

Mice and Men: Their Promoter Properties

Author: Adele Kruger
Alan Christoffels
Bill Pavan
Chikatoshi Kai
Christian Schönbach
David A Hume
John Hancock
Judith Blake
Jun Kawai
Leonard Lipovich
Liang Yang
Lisa Stubbs
Oliver Hofmann
Piero Carninci
Sin Lam Tan
Vladimir B Bajic
Winston Hide
Yoshihide Hayashizaki
Publication venue: Public Library of Science
Publication date: 01/01/2006
Field of study

Using the two largest collections of Mus musculus and Homo sapiens transcription start sites (TSSs) determined based on CAGE tags, ditags, full-length cDNAs, and other transcript data, we describe the compositional landscape surrounding TSSs with the aim of gaining better insight into the properties of mammalian promoters. We classified TSSs into four types based on compositional properties of regions immediately surrounding them. These properties highlighted distinctive features in the extended core promoters that helped us delineate boundaries of the transcription initiation domain space for both species. The TSS types were analyzed for associations with initiating dinucleotides, CpG islands, TATA boxes, and an extensive collection of statistically significant cis-elements in mouse and human. We found that different TSS types show preferences for different sets of initiating dinucleotides and cis-elements. Through Gene Ontology and eVOC categories and tissue expression libraries we linked TSS characteristics to expression. Moreover, we show a link of TSS characteristics to very specific genomic organization in an example of immune-response-related genes (GO:0006955). Our results shed light on the global properties of the two transcriptomes not revealed before and therefore provide the framework for better understanding of the transcriptional mechanisms in the two species, as well as a framework for development of new and more efficient promoter- and gene-finding tools

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

White Rose Research Online

University of Melbourne Institutional Repository

University of the Western Cape Research Repository

University of Queensland eSpace

Perceptual and technical barriers in sharing and formatting metadata accompanying omics studies

Author: Butte Atul J.
Corbett-Detig Russell
Deshpande Dhrithi
Garmire Lana X.
Hide Winston A.
Huang Yu-Ning
Hunter Christopher I
Love Michael I.
Mangul Serghei
Mons Barend
Moore Jason H.
Reddy T. B. K.
Robinson Mark D.
Ronkowski Cynthia Flaire
Schriml Lynn M.
Wong-Beringer Annie
Publication venue
Publication date: 22/11/2023
Field of study

Metadata, often termed "data about data," is crucial for organizing, understanding, and managing vast omics datasets. It aids in efficient data discovery, integration, and interpretation, enabling users to access, comprehend, and utilize data effectively. Its significance spans the domains of scientific research, facilitating data reproducibility, reusability, and secondary analysis. However, numerous perceptual and technical barriers hinder the sharing of metadata among researchers. These barriers compromise the reliability of research results and hinder integrative meta-analyses of omics studies . This study highlights the key barriers to metadata sharing, including the lack of uniform standards, privacy and legal concerns, limitations in study design, limited incentives, inadequate infrastructure, and the dearth of well-trained personnel for metadata management and reuse. Proposed solutions include emphasizing the promotion of standardization, educational efforts, the role of journals and funding agencies, incentives and rewards, and the improvement of infrastructure. More accurate, reliable, and impactful research outcomes are achievable if the scientific community addresses these barriers, facilitating more accurate, reliable, and impactful research outcomes

arXiv.org e-Print Archive

An Assessment of the Role of DNA Adenine Methyltransferase on Gene Expression Regulation in E coli

Author: A Henaut
A Lobner-Olesen
A Martinez-Antonio
A von Heydebreck
A von Heydebreck
Aswin Sai Narain Seshasayee
B Tjaden
BM Bolstad
D Wion
EA Kouzminova
F Al-Shahrour
FR Blattner
GK Smyth
H Salgado
H Zhang
JD Glasner
JL Robbins-Manke
K Sivaraman
K Yamanaka
LJ Rasmussen
MW Covert
P Rice
RC Gentleman
RG Ponder
T Oshima
Winston Hide
WS Somers
Y Kang
Publication venue: Public Library of Science
Publication date: 07/03/2007
Field of study

N6-Adenine methylation is an important epigenetic signal, which regulates various processes, such as DNA replication and repair and transcription. In γ-proteobacteria, Dam is a stand-alone enzyme that methylates GATC sites, which are non-randomly distributed in the genome. Some of these overlap with transcription factor binding sites. This work describes a global computational analysis of a published Dam knockout microarray alongside other publicly available data to throw insights into the extent to which Dam regulates transcription by interfering with protein binding. The results indicate that DNA methylation by DAM may not globally affect gene transcription by physically blocking access of transcription factors to binding sites. Down-regulation of Dam during stationary phase correlates with the activity of TFs whose binding sites are enriched for GATC sites

Crossref

Directory of Open Access Journals

PubMed Central