Search CORE

161 research outputs found

SpliceMiner: a high-throughput database implementation of the NCBI Evidence Viewer for microarray splice variant analysis

Author: Jamison D Curtis
Kahn Ari B
Liu Hongfang
Ryan Michael C
Weinstein John N
Zeeberg Barry R
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: There are many fewer genes in the human genome than there are expressed transcripts. Alternative splicing is the reason. Alternatively spliced transcripts are often specific to tissue type, developmental stage, environmental condition, or disease state. Accurate analysis of microarray expression data and design of new arrays for alternative splicing require assessment of probes at the sequence and exon levels. DESCRIPTION: SpliceMiner is a web interface for querying Evidence Viewer Database (EVDB). EVDB is a comprehensive, non-redundant compendium of splice variant data for human genes. We constructed EVDB as a queryable implementation of the NCBI Evidence Viewer (EV). EVDB is based on data obtained from NCBI Entrez Gene and EV. The automated EVDB build process uses only complete coding sequences, which may or may not include partial or complete 5' and 3' UTRs, and filters redundant splice variants. Unlike EV, which supports only one-at-a-time queries, SpliceMiner supports high-throughput batch queries and provides results in an easily parsable format. SpliceMiner maps probes to splice variants, effectively delineating the variants identified by a probe. CONCLUSION: EVDB can be queried by gene symbol, genomic coordinates, or probe sequence via a user-friendly web-based tool we call SpliceMiner (). The EVDB/SpliceMiner combination provides an interface with human splice variant information and, going beyond the very valuable NCBI Evidence Viewer, supports fluent, high-throughput analysis. Integration of EVDB information into microarray analysis and design pipelines has the potential to improve the analysis and bioinformatic interpretation of gene expression data, for both batch and interactive processing. For example, whenever a gene expression value is recognized as important or appears anomalous in a microarray experiment, the interactive mode of SpliceMiner can be used quickly and easily to check for possible splice variant issues

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Systems biology via redescription and ontologies (I): finding phase changes with applications to malaria temporal data

Author: B Zeeberg
BR Zeeberg
Bud Mishra
J Ernst
Kevin Casey
M Antoniotti
M Ashburner
N Friedman
N Slonim
PT Spellman
R Cilibrasi
Samantha Kleinberg
TM Cover
W Clark
Z Bar-Joseph
Publication venue: Springer Netherlands
Publication date: 01/12/2007
Field of study

Biological systems are complex and often composed of many subtly interacting components. Furthermore, such systems evolve through time and, as the underlying biology executes its genetic program, the relationships between components change and undergo dynamic reorganization. Characterizing these relationships precisely is a challenging task, but one that must be undertaken if we are to understand these systems in sufficient detail. One set of tools that may prove useful are the formal principles of model building and checking, which could allow the biologist to frame these inherently temporal questions in a sufficiently rigorous framework. In response to these challenges, GOALIE (Gene ontology algorithmic logic and information extractor) was developed and has been successfully employed in the analysis of high throughput biological data (e.g. time-course gene-expression microarray data and neural spike train recordings). The method has applications to a wide variety of temporal data, indeed any data for which there exist ontological descriptions. This paper describes the algorithms behind GOALIE and its use in the study of the Intraerythrocytic Developmental Cycle (IDC) of Plasmodium falciparum, the parasite responsible for a deadly form of chloroquine resistant malaria. We focus in particular on the problem of finding phase changes, times of reorganization of transcriptional control

CiteSeerX

Crossref

Springer - Publisher Connector

PubMed Central

TGFβ Signaling Increases Net Acid Extrusion, Proliferation and Invasion in Panc-1 Pancreatic Cancer Cells:SMAD4 Dependence and Link to Merlin/NF2 Signaling

Author: Christensen Søren T.
Ludwig Mette Q
Malinda Raj R.
Pedersen Lotte B.
Pedersen Stine F.
Sharku Patricia C.
Zeeberg Katrine
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2020
Field of study

Copenhagen University Research Information System

SpliceCenter: A suite of web-based bioinformatic applications for evaluating the impact of alternative splicing on RT-PCR, RNAi, microarray, and peptide-based studies

Author: Caplen Natasha J
Cleland James A
Kahn Ari B
Liu Hongfang
Ryan Michael C
Weinstein John N
Zeeberg Barry R
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Over 60% of protein-coding genes in vertebrates express mRNAs that undergo alternative splicing. The resulting collection of transcript isoforms poses significant challenges for contemporary biological assays. For example, RT-PCR validation of gene expression microarray results may be unsuccessful if the two technologies target different splice variants. Effective use of sequence-based technologies requires knowledge of the specific splice variant(s) that are targeted. In addition, the critical roles of alternative splice forms in biological function and in disease suggest that assay results may be more informative if analyzed in the context of the targeted splice variant. Results A number of contemporary technologies are used for analyzing transcripts or proteins. To enable investigation of the impact of splice variation on the interpretation of data derived from those technologies, we have developed SpliceCenter. SpliceCenter is a suite of user-friendly, web-based applications that includes programs for analysis of RT-PCR primer/probe sets, effectors of RNAi, microarrays, and protein-targeting technologies. Both interactive and high-throughput implementations of the tools are provided. The interactive versions of SpliceCenter tools provide visualizations of a gene's alternative transcripts and probe target positions, enabling the user to identify which splice variants are or are not targeted. The high-throughput batch versions accept user query files and provide results in tabular form. When, for example, we used SpliceCenter's batch siRNA-Check to process the Cancer Genome Anatomy Project's large-scale shRNA library, we found that only 59% of the 50,766 shRNAs in the library target all known splice variants of the target gene, 32% target some but not all, and 9% do not target any currently annotated transcript. Conclusion SpliceCenter <url>http://discover.nci.nih.gov/splicecenter</url> provides unique, user-friendly applications for assessing the impact of transcript variation on the design and interpretation of RT-PCR, RNAi, gene expression microarrays, antibody-based detection, and mass spectrometry proteomics. The tools are intended for use by bench biologists as well as bioinformaticists.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Nonlinear gene cluster analysis with labeling for microarray gene expression data in organ development

Author: A Sturn
B Zeeberg
B Zeeberg
Barry R Zeeberg
Brian P Brooks
CA Suàrez-Quian
CL Sigulinsky
DJ Mordantameron
Gene Ontology Consortium
Jacob Brown
JD Brown
JN Weinstein
K Pearson
L Kaufman
M Ashburner
M Belkin
M Belkin
Martin Ehler
P Langfelder
RF Bonner
Robert F Bonner
S Reichman
SP Lloyd
SR Goldstein
T Hastie
T Hestilow
Vinodh N Rajapakse
W Czaja
Wojciech Czaja
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

RedundancyMiner: De-replication of redundant GO categories in microarray and proteomics analysis

Author: A Alexa
A Sturn
AJ Richards
Ari B Kahn
Barry R Zeeberg
BR Zeeberg
BR Zeeberg
Brian P Brooks
C Herrmann
Hongfang Liu
J Wang
Jacob D Brown
JN Weinstein
JN Weinstein
John N Weinstein
K Prufer
M Ashburner
Martin Ehler
P Pehkonen
Robert F Bonner
S Bauer
S Grossmann
T Xu
Vinodh N Rajapakse
Vladimir L Larionov
William Reinhold
Y Lu
Yves G Pommier
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The Gene Ontology (GO) Consortium organizes genes into hierarchical categories based on biological process, molecular function and subcellular localization. Tools such as GoMiner can leverage GO to perform ontological analysis of microarray and proteomics studies, typically generating a list of significant functional categories. Two or more of the categories are often redundant, in the sense that identical or nearly-identical sets of genes map to the categories. The redundancy might typically inflate the report of significant categories by a factor of three-fold, create an illusion of an overly long list of significant categories, and obscure the relevant biological interpretation. Results We now introduce a new resource, RedundancyMiner, that de-replicates the redundant and nearly-redundant GO categories that had been determined by first running GoMiner. The main algorithm of RedundancyMiner, MultiClust, performs a novel form of cluster analysis in which a GO category might belong to several category clusters. Each category cluster follows a "complete linkage" paradigm. The metric is a similarity measure that captures the overlap in gene mapping between pairs of categories. Conclusions RedundancyMiner effectively eliminated redundancies from a set of GO categories. For illustration, we have applied it to the clarification of the results arising from two current studies: (1) assessment of the gene expression profiles obtained by laser capture microdissection (LCM) of serial cryosections of the retina at the site of final optic fissure closure in the mouse embryos at specific embryonic stages, and (2) analysis of a conceptual data set obtained by examining a list of genes deemed to be "kinetochore" genes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Adding a Little Reality to Building Ontologies for Biology

Author: A Rector
AP Seyed
B Russell
B Smith
B Smith
B Smith
B Zeeberg
G Merrill
I Johansson
Iddo Friedberg
J Shrager
K Wolstencroft
M Ashburner
M Dumontier
M Egana
P Grenon
P Lord
Phillip Lord
PL Whetzel
PW Lord
R Stevens
Robert Stevens
S Schulz
T Gruber
W Ceusters
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

BACKGROUND: Many areas of biology are open to mathematical and computational modelling. The application of discrete, logical formalisms defines the field of biomedical ontologies. Ontologies have been put to many uses in bioinformatics. The most widespread is for description of entities about which data have been collected, allowing integration and analysis across multiple resources. There are now over 60 ontologies in active use, increasingly developed as large, international collaborations. There are, however, many opinions on how ontologies should be authored; that is, what is appropriate for representation. Recently, a common opinion has been the "realist" approach that places restrictions upon the style of modelling considered to be appropriate. METHODOLOGY/PRINCIPAL FINDINGS: Here, we use a number of case studies for describing the results of biological experiments. We investigate the ways in which these could be represented using both realist and non-realist approaches; we consider the limitations and advantages of each of these models. CONCLUSIONS/SIGNIFICANCE: From our analysis, we conclude that while realist principles may enable straight-forward modelling for some topics, there are crucial aspects of science and the phenomena it studies that do not fit into this approach; realism appears to be over-simplistic which, perversely, results in overly complex ontological models. We suggest that it is impossible to avoid compromise in modelling ontology; a clearer understanding of these compromises will better enable appropriate modelling, fulfilling the many needs for discrete mathematical models within computational biology

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

Infectious Disease Ontology

Technological developments have resulted in tremendous increases in the volume and diversity of the data and information that must be processed in the course of biomedical and clinical research and practice. Researchers are at the same time under ever greater pressure to share data and to take steps to ensure that data resources are interoperable. The use of ontologies to annotate data has proven successful in supporting these goals and in providing new possibilities for the automated processing of data and information. In this chapter, we describe different types of vocabulary resources and emphasize those features of formal ontologies that make them most useful for computational applications. We describe current uses of ontologies and discuss future goals for ontology-based computing, focusing on its use in the field of infectious diseases. We review the largest and most widely used vocabulary resources relevant to the study of infectious diseases and conclude with a description of the Infectious Disease Ontology (IDO) suite of interoperable ontology modules that together cover the entire infectious disease domain

PhilPapers

CiteSeerX

Crossref

Universality and Shannon entropy of codon usage

Author: A. Czirok
A. Sciarrino
A. Som
B. Zeeberg
C. E. Shannon
C. Furusawa
C. Martindale
C. Minichini
G. Gamow
G. U. Yule
L. Frappat
L. Frappat
P. Sorba
R. D. Knight
R. F. Voss
R. Israeloff
R. N. Mantegna
R. N. Mantegna
S. Bonhoeffer
V. A. Kuznetsov
W. Li
Y. Nakamura
Publication venue: 'American Physical Society (APS)'
Publication date: 25/04/2003
Field of study

The distribution functions of the codon usage probabilities, computed over all the available GenBank data, for 40 eukaryotic biological species and 5 chloroplasts, do not follow a Zipf law, but are best fitted by the sum of a constant, an exponential and a linear function in the rank of usage. For mitochondriae the analysis is not conclusive. A quantum-mechanics-inspired model is proposed to describe the observed behaviour. These functions are characterized by parameters that strongly depend on the total GC content of the coding regions of biological species. It is predicted that the codon usage is the same in all exonic genes with the same GC content. The Shannon entropy for codons, also strongly depending on the exonic GC content, is computed.Comment: Latex 25 pages, 21 figure

arXiv.org e-Print Archive

Crossref

GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists

Author: A Subramanian
B Zeeberg
B Zhang
C Backes
Doron Lipson
E Eden
E Gansner
EI Boyle
Eran Eden
F Al-Shahrour
F Al-Shahrour
GD Jr
Israel Steinfeld
JJJ Goeman
LJ van't Veer
M Ashburner
P Khatri
Q Xu
QWX Zheng
R Breitling
R Sealfon
Roy Navon
S Maere
TST Beissbarth
Zohar Yakhini
Publication venue: BioMed Central
Publication date: 01/02/2009
Field of study

Abstract Background Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database. In particular, a variety of tools that perform GO enrichment analysis are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. A few tools also exist that support analyzing ranked lists. The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results. Results <it>GOrilla </it>is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets. This is particularly useful in many typical cases where genomic data may be naturally represented as a ranked list of genes (e.g. by level of expression or of differential expression). <it>GOrilla </it>employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the <it>top </it>of a ranked gene list. Building on a complete theoretical characterization of the underlying distribution, called mHG, <it>GOrilla </it>computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations. This enables rigorous statistical analysis of thousand of genes and thousands of GO terms in order of seconds. The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms. Conclusion <it>GOrilla </it>is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools. <it>GOrilla</it>'s unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation. <it>GOrilla </it>is publicly available at: <url>http://cbl-gorilla.cs.technion.ac.il</url></p

Crossref

Directory of Open Access Journals

PubMed Central