2,676 research outputs found
Compositional Mining of Multi-Relational Biological Datasets
High-throughput biological screens are yielding ever-growing streams of
information about multiple aspects of cellular activity. As more and more
categories of datasets come online, there is a corresponding multitude of ways
in which inferences can be chained across them, motivating the need for
compositional data mining algorithms. In this paper, we argue that such
compositional data mining can be effectively realized by functionally cascading
redescription mining and biclustering algorithms as primitives. Both these
primitives mirror shifts of vocabulary that can be composed in arbitrary ways
to create rich chains of inferences. Given a relational database and its
schema, we show how the schema can be automatically compiled into a
compositional data mining program, and how different domains in the schema can
be related through logical sequences of biclustering and redescription
invocations. This feature allows us to rapidly prototype new data mining
applications, yielding greater understanding of scientific datasets. We
describe two applications of compositional data mining: (i) matching terms
across categories of the Gene Ontology and (ii) understanding the molecular
mechanisms underlying stress response in human cells
Knowledge Rich Natural Language Queries over Structured Biological Databases
Increasingly, keyword, natural language and NoSQL queries are being used for
information retrieval from traditional as well as non-traditional databases
such as web, document, image, GIS, legal, and health databases. While their
popularity are undeniable for obvious reasons, their engineering is far from
simple. In most part, semantics and intent preserving mapping of a well
understood natural language query expressed over a structured database schema
to a structured query language is still a difficult task, and research to tame
the complexity is intense. In this paper, we propose a multi-level
knowledge-based middleware to facilitate such mappings that separate the
conceptual level from the physical level. We augment these multi-level
abstractions with a concept reasoner and a query strategy engine to dynamically
link arbitrary natural language querying to well defined structured queries. We
demonstrate the feasibility of our approach by presenting a Datalog based
prototype system, called BioSmart, that can compute responses to arbitrary
natural language queries over arbitrary databases once a syntactic
classification of the natural language query is made
Using ILP to Identify Pathway Activation Patterns in Systems Biology
We show a logical aggregation method that, combined with propositionalization methods, can construct novel structured biological features from gene expression data. We do this to gain understanding of pathway mechanisms, for instance, those associated with a particular disease. We illustrate this method on the task of distinguishing between two types of lung cancer; Squamous Cell Carcinoma (SCC) and Adenocarcinoma (AC). We identify pathway activation patterns in pathways previously implicated in the development of cancers. Our method identified a model with comparable predictive performance to the winning algorithm of a recent challenge, while providing biologically relevant explanations that may be useful to a biologist
- …