Search CORE

1,808 research outputs found

Ontologies and Information Extraction

Author: Nazarenko Adeline
Nédellec Claire
Publication venue
Publication date: 01/01/2005
Field of study

This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris 13

GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies

Author: Kirov Stefan
Schmoyer Denise
Snoddy Jay
Zhang Bing
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. RESULTS: We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . CONCLUSION: GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PubMatrix: a tool for multiplex literature mining

Author: Becker Kevin G
Bright Tiffani J
Cheadle Chris
Dennis Glynn
Engel Jim
Hosack Douglas A
Lempicki Richard A
Publication venue: BioMed Central
Publication date: 10/12/2003
Field of study

BACKGROUND: Molecular experiments using multiplex strategies such as cDNA microarrays or proteomic approaches generate large datasets requiring biological interpretation. Text based data mining tools have recently been developed to query large biological datasets of this type of data. PubMatrix is a web-based tool that allows simple text based mining of the NCBI literature search service PubMed using any two lists of keywords terms, resulting in a frequency matrix of term co-occurrence. RESULTS: For example, a simple term selection procedure allows automatic pair-wise comparisons of approximately 1–100 search terms versus approximately 1–10 modifier terms, resulting in up to 1,000 pair wise comparisons. The matrix table of pair-wise comparisons can then be surveyed, queried individually, and archived. Lists of keywords can include any terms currently capable of being searched in PubMed. In the context of cDNA microarray studies, this may be used for the annotation of gene lists from clusters of genes that are expressed coordinately. An associated PubMatrix public archive provides previous searches using common useful lists of keyword terms. CONCLUSIONS: In this way, lists of terms, such as gene names, or functional assignments can be assigned genetic, biological, or clinical relevance in a rapid flexible systematic fashion

Springer - Publisher Connector

PubMed Central

BiologicalNetworks: visualization and analysis tool for systems biology

Author: Baitaluk Michael
Gupta Amarnath
Ray Animesh
Sedova Mayya
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Systems level investigation of genomic scale information requires the development of truly integrated databases dealing with heterogeneous data, which can be queried for simple properties of genes or other database objects as well as for complex network level properties, for the analysis and modelling of complex biological processes. Towards that goal, we recently constructed PathSys, a data integration platform for systems biology, which provides dynamic integration over a diverse set of databases [Baitaluk et al. (2006) BMC Bioinformatics 7, 55]. Here we describe a server, BiologicalNetworks, which provides visualization, analysis services and an information management framework over PathSys. The server allows easy retrieval, construction and visualization of complex biological networks, including genome-scale integrated networks of protein–protein, protein–DNA and genetic interactions. Most importantly, BiologicalNetworks addresses the need for systematic presentation and analysis of high-throughput expression data by mapping and analysis of expression profiles of genes or proteins simultaneously on to regulatory, metabolic and cellular networks. BiologicalNetworks Server is available at

CiteSeerX

Crossref

PubMed Central

GeneNarrator: Mining the Literaturome for Relations Among Genes

Author: Berleant Daniel
Ding Jing
Fulmer Andy
Juhlin Kenton
Wurtele Eve
Wurtele Eve
Xu Jun
Publication venue: Iowa State University Digital Repository
Publication date: 01/08/2009
Field of study

The rapid development of microarray and other genomic technologies now enables biologists to monitor the expression of hundreds, even thousands of genes in a single experiment. Interpreting the biological meaning of the expression patterns still relies largely on biologist\u27s domain knowledge, as well as on information collected from the literature and various public databases. Yet individual experts’ domain knowledge is insufficient for large data sets, and collecting and analyzing this information manually from the literature and/or public databases is tedious and time-consuming. Computer-aided functional analysis tools are therefore highly desirable. We describe the architecture of GeneNarrator, a text mining system for functional analysis of microarray data. This system’s primary purpose is to test the feasibility of a more general system architecture based on a two-stage clustering strategy that is explained in detail. Given a list of genes, GeneNarrator collects abstracts about them from PubMed, then clusters the abstracts into functional topics in a first clustering stage. In the second clustering stage, the genes are clustered into groups based on similarities in their distributions of occurrence across topics. This novel two-stage architecture, the primary contribution of this project, has benefits not easily provided by onestage clustering

Digital Repository @ Iowa State University (ISU)

Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes

Author: Dorssers L.C.J. (Lambert)
Eijk C.C. (Christiaan) van der
Jelier R. (Rob)
Jenster G.W. (Guido)
Kors J.A. (Jan)
Mons B. (Barend)
Mulligen E.M. (Erik) van
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2005
Field of study

MOTIVATION: The advent of high-throughput experiments in molecular biology creates a need for methods to efficiently extract and use information for large numbers of genes. Recently, the associative concept space (ACS) has been developed for the representation of information extracted from biomedical literature. The ACS is a Euclidean space in which thesaurus concepts are positioned and the distances between concepts indicates their relatedness. The ACS uses co-occurrence of concepts as a source of information. In this paper we evaluate how well the system can retrieve functionally related genes and we compare its performance with a simple gene co-occurrence method. RESULTS: To assess the performance of the ACS we composed a test set of five groups of functionally related genes. With the ACS good scores were obtained for four of the five groups. When compared to the gene co-occurrence method, the ACS is capable of revealing more functional biological relations and can achieve results with less literature available per gene. Hierarchical clustering was performed on the ACS output, as a potential aid to users, and was found to provide useful clusters. Our results suggest that the algorithm can be of value for researchers studying large numbers of genes. AVAILABILITY: The ACS program is available upon request from the authors

EUR Research Repository

Erasmus University Digital Repository

CoPub Mapper: mining MEDLINE based on search term co-publication

Author: Alako Blaise TF
Jelier Rob
Jenster Guido
Polman Jan
Rullmann Ton
van Baal Sjozef
Veldhoven Antoine
Verhoeven Stefan
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: High throughput microarray analyses result in many differentially expressed genes that are potentially responsible for the biological process of interest. In order to identify biological similarities between genes, publications from MEDLINE were identified in which pairs of gene names and combinations of gene name with specific keywords were co-mentioned. RESULTS: MEDLINE search strings for 15,621 known genes and 3,731 keywords were generated and validated. PubMed IDs were retrieved from MEDLINE and relative probability of co-occurrences of all gene-gene and gene-keyword pairs determined. To assess gene clustering according to literature co-publication, 150 genes consisting of 8 sets with known connections (same pathway, same protein complex, or same cellular localization, etc.) were run through the program. Receiver operator characteristics (ROC) analyses showed that most gene sets were clustered much better than expected by random chance. To test grouping of genes from real microarray data, 221 differentially expressed genes from a microarray experiment were analyzed with CoPub Mapper, which resulted in several relevant clusters of genes with biological process and disease keywords. In addition, all genes versus keywords were hierarchical clustered to reveal a complete grouping of published genes based on co-occurrence. CONCLUSION: The CoPub Mapper program allows for quick and versatile querying of co-published genes and keywords and can be successfully used to cluster predefined groups of genes and microarray data

Lirias

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

EUR Research Repository

Wageningen University & Research Publications

Erasmus University Digital Repository

Functional Interpretation of Omics Data by Profiling Genes and Diseases Using MeSH–Controlled Vocabulary

Author: Hidemasa Bono
Takeru Nakazato
Toshihisa Takagi
Publication venue: 'IntechOpen'
Publication date: 21/11/2011
Field of study

IntechOpen

A web services choreography scenario for interoperating bioinformatics applications

Author: Cheung David W
Cheung Kei-Hoi
de Knikker Remko
Guo Youjun
Kwan Albert KH
Li Jin-long
Yip Kevin Y
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: Very often genome-wide data analysis requires the interoperation of multiple databases and analytic tools. A large number of genome databases and bioinformatics applications are available through the web, but it is difficult to automate interoperation because: 1) the platforms on which the applications run are heterogeneous, 2) their web interface is not machine-friendly, 3) they use a non-standard format for data input and output, 4) they do not exploit standards to define application interface and message exchange, and 5) existing protocols for remote messaging are often not firewall-friendly. To overcome these issues, web services have emerged as a standard XML-based model for message exchange between heterogeneous applications. Web services engines have been developed to manage the configuration and execution of a web services workflow. RESULTS: To demonstrate the benefit of using web services over traditional web interfaces, we compare the two implementations of HAPI, a gene expression analysis utility developed by the University of California San Diego (UCSD) that allows visual characterization of groups or clusters of genes based on the biomedical literature. This utility takes a set of microarray spot IDs as input and outputs a hierarchy of MeSH Keywords that correlates to the input and is grouped by Medical Subject Heading (MeSH) category. While the HTML output is easy for humans to visualize, it is difficult for computer applications to interpret semantically. To facilitate the capability of machine processing, we have created a workflow of three web services that replicates the HAPI functionality. These web services use document-style messages, which means that messages are encoded in an XML-based format. We compared three approaches to the implementation of an XML-based workflow: a hard coded Java application, Collaxa BPEL Server and Taverna Workbench. The Java program functions as a web services engine and interoperates with these web services using a web services choreography language (BPEL4WS). CONCLUSION: While it is relatively straightforward to implement and publish web services, the use of web services choreography engines is still in its infancy. However, industry-wide support and push for web services standards is quickly increasing the chance of success in using web services to unify heterogeneous bioinformatics applications. Due to the immaturity of currently available web services engines, it is still most practical to implement a simple, ad-hoc XML-based workflow by hard coding the workflow as a Java application. For advanced web service users the Collaxa BPEL engine facilitates a configuration and management environment that can fully handle XML-based workflow

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HKU Scholars Hub

Interpreting microarray experiments via co-expressed gene groups analysis

Author: A. Riva
D. Gibbons
D. Masys
G. Sung
J. DeRisi
J. Quackenbush
M. Robinson
R. Breitling
R. Chuaqui
S. Draghici
S. Kim
T. Attwood
V. Mootha
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

International audienceMicroarray technology produces vast amounts of data by measuring simultaneously the expression levels of thousands of genes under hundreds of biological conditions. Nowadays, one of the principal challenges in bioinformatics is the interpretation of huge data using different sources of information. We propose a novel data analysis method named CGGA (Co-expressed Gene Groups Analysis) that automatically finds groups of genes that are functionally enriched, i.e. have the same functional annotations, and are co- expressed. CGGA automatically integrates the information of microarrays, i.e. gene expression profiles, with the functional annotations of the genes obtained by the genome-wide information sources such as Gene Ontology (GO)1. By applying CGGA to well-known microarray experiments, we have identified the principal functionally enriched and co-expressed gene groups, and we have shown that this approach enhances and accelerates the interpretation of DNA microarray experiments

Crossref

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech