Search CORE

14,766 research outputs found

Knowledge Rich Natural Language Queries over Structured Biological Databases

Author: Chu W. W.
Goldsmith E. J.
InterProlog
Kossmann D.
Lawrence C.
Maio C. D.
Mir S.
Mou X.
Nandi A.
Novik L.
Safran M.
Swofford D. L.
Publication venue
Publication date: 30/03/2017
Field of study

Increasingly, keyword, natural language and NoSQL queries are being used for information retrieval from traditional as well as non-traditional databases such as web, document, image, GIS, legal, and health databases. While their popularity are undeniable for obvious reasons, their engineering is far from simple. In most part, semantics and intent preserving mapping of a well understood natural language query expressed over a structured database schema to a structured query language is still a difficult task, and research to tame the complexity is intense. In this paper, we propose a multi-level knowledge-based middleware to facilitate such mappings that separate the conceptual level from the physical level. We augment these multi-level abstractions with a concept reasoner and a query strategy engine to dynamically link arbitrary natural language querying to well defined structured queries. We demonstrate the feasibility of our approach by presenting a Datalog based prototype system, called BioSmart, that can compute responses to arbitrary natural language queries over arbitrary databases once a syntactic classification of the natural language query is made

arXiv.org e-Print Archive

Crossref

An introduction to Graph Data Management

Author: A Dries
A Gutiérrez
A Iosup
A Morari
A Poulovassilis
AD Zhu
AO Mendelzon
B Amann
B Elser
C Berge
C Vicknair
C Watters
C Weiss
CS Chang
D Conte
D Dominguez-Sal
D Theodoratos
DC Faye
DW Shipman
EF Codd
FW Tompa
G Malewicz
GM Kuper
H He
HS Kunii
IF Cruz
IF Cruz
J Hidders
J Paredaens
J Peckham
J. Hidders
Jonathan Hayes
K Zeng
L Kowalik
L Zou
M Atre
M Ciglan
M Consens
M Gemis
M Gyssens
M Han
M Levene
M Levene
M Levene
M Mainguenaud
M Schmidt
M Yannakakis
MA Bornea
MA Rodriguez
MA Rodriguez
Marc Andries
MP Consens
MP Consens
N Kiesel
N Roussopoulos
O Erling
P Barceló Baeza
P Buneman
P Yuan
Philippe Cudré-Mauroux
PPS Chen
PT Wood
PT Wood
R Agrawal
R Angles
R Angles
R Brijder
R Ronen
RH Güting
RS Xin
S Abiteboul
S Abiteboul
T Neumann
W Fan
W Kim
Y Guo
Y Low
Y Papakonstantinou
Y Tian
Y Zhao
YA Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/12/2017
Field of study

A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them

arXiv.org e-Print Archive

Crossref

A Data Transformation System for Biological Data Sources

Author: Buneman Peter
Davidson Susan
Hart Kyle
Overton Chris
Wong L.
Publication venue
Publication date: 01/01/1995
Field of study

Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data

CiteSeerX

Edinburgh Research Explorer

ScholarlyCommons@Penn

Integration of Biological Sources: Exploring the Case of Protein Homology

Author: Boerman Tjeerd W.
Keulen Maurice van
Severing Edouard I.
Vet Paul van der
Publication venue: University of Twente, Centre for Telematics and Information Technology
Publication date: 01/01/2011
Field of study

Data integration is a key issue in the domain of bioin- formatics, which deals with huge amounts of heteroge- neous biological data that grows and changes rapidly. This paper serves as an introduction in the field of bioinformatics and the biological concepts it deals with, and an exploration of the integration problems a bioinformatics scientist faces. We examine ProGMap, an integrated protein homology system used by bioin- formatics scientists at Wageningen University, and several use cases related to protein homology. A key issue we identify is the huge manual effort required to unify source databases into a single resource. Un- certain databases are able to contain several possi- ble worlds, and it has been proposed that they can be used to significantly reduce initial integration efforts. We propose several directions for future work where uncertain databases can be applied to bioinformatics, with the goal of furthering the cause of bioinformatics integration

University of Twente Research Information

Heterogeneous biomedical database integration using a hybrid strategy: a p53 cancer research database.

Author: Bichutskiy Vadim Y
Brachmann Rainer K
Colman Richard
Lathrop Richard H
Publication venue: eScholarship, University of California
Publication date: 01/01/2006
Field of study

Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.)

Directory of Open Access Journals

eScholarship - University of California

Expanding sensor networks to automate knowledge acquisition

Author: Conroy Kenneth
May Gregory
Roantree Mark
Warrington Giles
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

The availability of accurate, low-cost sensors to scientists has resulted in widespread deployment in a variety of sporting and health environments. The sensor data output is often in a raw, proprietary or unstructured format. As a result, it is often difficult to query multiple sensors for complex properties or actions. In our research, we deploy a heterogeneous sensor network to detect the various biological and physiological properties in athletes during training activities. The goal for exercise physiologists is to quickly identify key intervals in exercise such as moments of stress or fatigue. This is not currently possible because of low level sensors and a lack of query language support. Thus, our motivation is to expand the sensor network with a contextual layer that enriches raw sensor data, so that it can be exploited by a high level query language. To achieve this, the domain expert specifies events in a tradiational event-condition-action format to deliver the required contextual enrichment

CiteSeerX

Irish Universities

DCU Online Research Access Service

Representing and analysing molecular and cellular function in the computer

Author: Eldridge M
Gilbert D
Helden JV
Mancuso R
Naim A
Wernisch L
Wodak SJ
Publication venue: 'American Society for Biochemistry & Molecular Biology (ASBMB)'
Publication date: 01/01/2000
Field of study

Determining the biological function of a myriad of genes, and understanding how they interact to yield a living cell, is the major challenge of the post genome-sequencing era. The complexity of biological systems is such that this cannot be envisaged without the help of powerful computer systems capable of representing and analysing the intricate networks of physical and functional interactions between the different cellular components. In this review we try to provide the reader with an appreciation of where we stand in this regard. We discuss some of the inherent problems in describing the different facets of biological function, give an overview of how information on function is currently represented in the major biological databases, and describe different systems for organising and categorising the functions of gene products. In a second part, we present a new general data model, currently under development, which describes information on molecular function and cellular processes in a rigorous manner. The model is capable of representing a large variety of biochemical processes, including metabolic pathways, regulation of gene expression and signal transduction. It also incorporates taxonomies for categorising molecular entities, interactions and processes, and it offers means of viewing the information at different levels of resolution, and dealing with incomplete knowledge. The data model has been implemented in the database on protein function and cellular processes 'aMAZE' (http://www.ebi.ac.uk/research/pfbp/), which presently covers metabolic pathways and their regulation. Several tools for querying, displaying, and performing analyses on such pathways are briefly described in order to illustrate the practical applications enabled by the model

HAL AMU

DI-fusion

Brunel University Research Archive

A simple and robust method for connecting small-molecule drugs using gene-expression signatures

Author: B Efron
CC Ting
J Frasor
J Lamb
JD Storey
JJ Chen
JS Warrington
KB Glaser
L Tian
M Lee
N Fujimoto
N Sathyamoorthy
PA Horwitz
R Januchowski
S McHugh
SD Zhang
Shu-Dong Zhang
ST Matalon
Timothy W Gant
VW Armstrong
Y Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Interaction of a drug or chemical with a biological system can result in a gene-expression profile or signature characteristic of the event. Using a suitably robust algorithm these signatures can potentially be used to connect molecules with similar pharmacological or toxicological properties. The Connectivity Map was a novel concept and innovative tool first introduced by Lamb et al to connect small molecules, genes, and diseases using genomic signatures [Lamb et al (2006), Science 313, 1929-1935]. However, the Connectivity Map had some limitations, particularly there was no effective safeguard against false connections if the observed connections were considered on an individual-by-individual basis. Further when several connections to the same small-molecule compound were viewed as a set, the implicit null hypothesis tested was not the most relevant one for the discovery of real connections. Here we propose a simple and robust method for constructing the reference gene-expression profiles and a new connection scoring scheme, which importantly allows the valuation of statistical significance of all the connections observed. We tested the new method with the two example gene-signatures (HDAC inhibitors and Estrogens) used by Lamb et al and also a new gene signature of immunosuppressive drugs. Our testing with this new method shows that it achieves a higher level of specificity and sensitivity than the original method. For example, our method successfully identified raloxifene and tamoxifen as having significant anti-estrogen effects, while Lamb et al's Connectivity Map failed to identify these. With these properties our new method has potential use in drug development for the recognition of pharmacological and toxicological properties in new drug candidates.Comment: 8 pages, 2 figures, and 2 tables; supplementary data supplied as a ZIP fil

arXiv.org e-Print Archive

Queen's University Belfast Research Portal

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Ulster University's Research Portal

Leicester Research Archive