8 research outputs found
Optimized Seamless Integration of Biomolecular Data
Today, scientific data is inevitably digitized, stored in a wide
variety of heterogeneous formats, and is accessible over the Internet.
Scientists need to access an integrated view of multiple remote or
local heterogeneous data sources. They then integrate the results
of complex queries and apply further analysis and visualization
to support the task of scientific discovery. Building such a digital
library for scientific discovery requires accessing and manipulating
data extracted from flat files or databases, documents retrieved from
the Web, as well as data that is locally materialized in warehouses
or is generated by software. We consider several tasks to provide optimized and seamless integration of biomolecular data. Challenges
to be addressed include capturing and representing source capabilities;
developing a methodology to acquire and represent semantic knowledge
and metadata about source contents, overlap in source contents,
and access costs; and decision support to select sources
and capabilities using cost based and semantic knowledge, and
generating low cost query evaluation plans.
(Also referenced as UMIACS-TR-2001-51
A planner/optimizer/executioner for content mediated queries
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (leaves 56-57).by Kofi Duodu Fynn.M.Eng
The Divide-and-Conquer Subgoal-Ordering Algorithm for Speeding up Logic Inference
It is common to view programs as a combination of logic and control: the
logic part defines what the program must do, the control part -- how to do it.
The Logic Programming paradigm was developed with the intention of separating
the logic from the control. Recently, extensive research has been conducted on
automatic generation of control for logic programs. Only a few of these works
considered the issue of automatic generation of control for improving the
efficiency of logic programs. In this paper we present a novel algorithm for
automatic finding of lowest-cost subgoal orderings. The algorithm works using
the divide-and-conquer strategy. The given set of subgoals is partitioned into
smaller sets, based on co-occurrence of free variables. The subsets are ordered
recursively and merged, yielding a provably optimal order. We experimentally
demonstrate the utility of the algorithm by testing it in several domains, and
discuss the possibilities of its cooperation with other existing methods
Yavaa: supporting data workflows from discovery to visualization
Recent years have witness an increasing number of data silos being opened up both within organizations and to the general public: Scientists publish their raw data as supplements to articles or even standalone artifacts to enable others to verify and extend their work. Governments pass laws to open up formerly protected data treasures to improve accountability and transparency as well as to enable new business ideas based on this public good. Even companies share structured information about their products and services to advertise their use and thus increase revenue. Exploiting this wealth of information holds many challenges for users, though. Oftentimes data is provided as tables whose sheer endless rows of daunting numbers are barely accessible. InfoVis can mitigate this gap. However, offered visualization options are generally very limited and next to no support is given in applying any of them. The same holds true for data wrangling. Only very few options to adjust the data to the current needs and barely any protection are in place to prevent even the most obvious mistakes. When it comes to data from multiple providers, the situation gets even bleaker. Only recently tools emerged to search for datasets across institutional borders reasonably. Easy-to-use ways to combine these datasets are still missing, though. Finally, results generally lack proper documentation of their provenance. So even the most compelling visualizations can be called into question when their coming about remains unclear. The foundations for a vivid exchange and exploitation of open data are set, but the barrier of entry remains relatively high, especially for non-expert users. This thesis aims to lower that barrier by providing tools and assistance, reducing the amount of prior experience and skills required. It covers the whole workflow ranging from identifying proper datasets, over possible transformations, up until the export of the result in the form of suitable visualizations