Search CORE

2,676 research outputs found

Compositional Mining of Multi-Relational Biological Datasets

Author: Jin Ying
Murali T.M.
Ramakrishnan Naren
Publication venue
Publication date: 01/01/2007
Field of study

High-throughput biological screens are yielding ever-growing streams of information about multiple aspects of cellular activity. As more and more categories of datasets come online, there is a corresponding multitude of ways in which inferences can be chained across them, motivating the need for compositional data mining algorithms. In this paper, we argue that such compositional data mining can be effectively realized by functionally cascading redescription mining and biclustering algorithms as primitives. Both these primitives mirror shifts of vocabulary that can be composed in arbitrary ways to create rich chains of inferences. Given a relational database and its schema, we show how the schema can be automatically compiled into a compositional data mining program, and how different domains in the schema can be related through logical sequences of biclustering and redescription invocations. This feature allows us to rapidly prototype new data mining applications, yielding greater understanding of scientific datasets. We describe two applications of compositional data mining: (i) matching terms across categories of the Gene Ontology and (ii) understanding the molecular mechanisms underlying stress response in human cells

Computer Science Technical Reports @Virginia Tech

CiteSeerX

Data mining the yeast genome in a lazy functional language

Author: Clare Amanda
King Ross Donald
Publication venue
Publication date: 01/01/2003
Field of study

Aberystwyth Research Portal

Knowledge Rich Natural Language Queries over Structured Biological Databases

Author: Chu W. W.
Goldsmith E. J.
InterProlog
Kossmann D.
Lawrence C.
Maio C. D.
Mir S.
Mou X.
Nandi A.
Novik L.
Safran M.
Swofford D. L.
Publication venue
Publication date: 30/03/2017
Field of study

Increasingly, keyword, natural language and NoSQL queries are being used for information retrieval from traditional as well as non-traditional databases such as web, document, image, GIS, legal, and health databases. While their popularity are undeniable for obvious reasons, their engineering is far from simple. In most part, semantics and intent preserving mapping of a well understood natural language query expressed over a structured database schema to a structured query language is still a difficult task, and research to tame the complexity is intense. In this paper, we propose a multi-level knowledge-based middleware to facilitate such mappings that separate the conceptual level from the physical level. We augment these multi-level abstractions with a concept reasoner and a query strategy engine to dynamically link arbitrary natural language querying to well defined structured queries. We demonstrate the feasibility of our approach by presenting a Datalog based prototype system, called BioSmart, that can compute responses to arbitrary natural language queries over arbitrary databases once a syntactic classification of the natural language query is made

arXiv.org e-Print Archive

Crossref

Significant Pattern Discovery in Gene Location and Phylogeny

Author: Riley Michael Charles
Publication venue
Publication date: 07/04/2009
Field of study

Aberystwyth Research Portal

Data mining the yeast genome in a lazy functional language

Author: Clare Amanda
King Ross Donald
Publication venue
Publication date: 01/01/2003
Field of study

Aberystwyth Research Portal

The University of Manchester - Institutional Repository

Biomedical Literature Mining for Biological Databases Annotation

Author: Donato Malerba
Gaetano Scioscia
Marcella Attimonelli
Margherita Berardi
Pietro Leo
Roberta Piredda
Publication venue: 'IntechOpen'
Publication date: 01/01/2008
Field of study

IntechOpen

Crossref

Archivio istituzionale della ricerca - Università di Bari

Using ILP to Identify Pathway Activation Patterns in Systems Biology

Author: A Subramanian
AL Tarca
C Perlich
D Croft
D Gamberger
JJ Tyson
K Rhrissorrakrai
K Whelan
L Danon
L Dehaspe
L Raedt De
M Holec
MN McCall
MVM França
N Lavrač
N Lavrač
O Kuželka
P Ristoski
PA Flach
R Edgar
R-S Wang
S Draghici
W Kim
W Rongrong
X Robin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We show a logical aggregation method that, combined with propositionalization methods, can construct novel structured biological features from gene expression data. We do this to gain understanding of pathway mechanisms, for instance, those associated with a particular disease. We illustrate this method on the task of distinguishing between two types of lung cancer; Squamous Cell Carcinoma (SCC) and Adenocarcinoma (AC). We identify pathway activation patterns in pathways previously implicated in the development of cancers. Our method identified a model with comparable predictive performance to the winning algorithm of a recent challenge, while providing biologically relevant explanations that may be useful to a biologist

Crossref

PubMed Central

King's Research Portal

Explore Bristol Research

Text Mining for Drug Discovery

Author: Piliouras Dimitrios
Publication venue
Publication date: 01/05/2014
Field of study

The University of Manchester - Institutional Repository