20,946 research outputs found
Sparse cross-products of metadata in scientific simulation management
Managing scientific data is by no means a trivial task even in a single site environment
with a small number of researchers involved. We discuss some issues concerned with posing
well-specified experiments in terms of parameters or instrument settings and the metadata
framework that arises from doing so. We are particularly interested in parallel computer
simulation experiments, where very large quantities of warehouse-able data are involved. We
consider SQL databases and other framework technologies for manipulating experimental data.
Our framework manages the the outputs from parallel runs that arise from large cross-products
of parameter combinations. Considerable useful experiment planning and analysis can be done
with the sparse metadata without fully expanding the parameter cross-products. Extra value
can be obtained from simulation output that can subsequently be data-mined. We have
particular interests in running large scale Monte-Carlo physics model simulations. Finding
ourselves overwhelmed by the problems of managing data and compute ¿resources, we have
built a prototype tool using Java and MySQL that addresses these issues. We use this example
to discuss type-space management and other fundamental ideas for implementing a laboratory
information management system
Towards a Holistic Integration of Spreadsheets with Databases: A Scalable Storage Engine for Presentational Data Management
Spreadsheet software is the tool of choice for interactive ad-hoc data
management, with adoption by billions of users. However, spreadsheets are not
scalable, unlike database systems. On the other hand, database systems, while
highly scalable, do not support interactivity as a first-class primitive. We
are developing DataSpread, to holistically integrate spreadsheets as a
front-end interface with databases as a back-end datastore, providing
scalability to spreadsheets, and interactivity to databases, an integration we
term presentational data management (PDM). In this paper, we make a first step
towards this vision: developing a storage engine for PDM, studying how to
flexibly represent spreadsheet data within a database and how to support and
maintain access by position. We first conduct an extensive survey of
spreadsheet use to motivate our functional requirements for a storage engine
for PDM. We develop a natural set of mechanisms for flexibly representing
spreadsheet data and demonstrate that identifying the optimal representation is
NP-Hard; however, we develop an efficient approach to identify the optimal
representation from an important and intuitive subclass of representations. We
extend our mechanisms with positional access mechanisms that don't suffer from
cascading update issues, leading to constant time access and modification
performance. We evaluate these representations on a workload of typical
spreadsheets and spreadsheet operations, providing up to 20% reduction in
storage, and up to 50% reduction in formula evaluation time
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
- …