679 research outputs found
ArrayBridge: Interweaving declarative array processing with high-performance computing
Scientists are increasingly turning to datacenter-scale computers to produce
and analyze massive arrays. Despite decades of database research that extols
the virtues of declarative query processing, scientists still write, debug and
parallelize imperative HPC kernels even for the most mundane queries. This
impedance mismatch has been partly attributed to the cumbersome data loading
process; in response, the database community has proposed in situ mechanisms to
access data in scientific file formats. Scientists, however, desire more than a
passive access method that reads arrays from files.
This paper describes ArrayBridge, a bi-directional array view mechanism for
scientific file formats, that aims to make declarative array manipulations
interoperable with imperative file-centric analyses. Our prototype
implementation of ArrayBridge uses HDF5 as the underlying array storage library
and seamlessly integrates into the SciDB open-source array database system. In
addition to fast querying over external array objects, ArrayBridge produces
arrays in the HDF5 file format just as easily as it can read from it.
ArrayBridge also supports time travel queries from imperative kernels through
the unmodified HDF5 API, and automatically deduplicates between array versions
for space efficiency. Our extensive performance evaluation in NERSC, a
large-scale scientific computing facility, shows that ArrayBridge exhibits
statistically indistinguishable performance and I/O scalability to the native
SciDB storage engine.Comment: 12 pages, 13 figure
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
Theory and Practice of Data Citation
Citations are the cornerstone of knowledge propagation and the primary means
of assessing the quality of research, as well as directing investments in
science. Science is increasingly becoming "data-intensive", where large volumes
of data are collected and analyzed to discover complex patterns through
simulations and experiments, and most scientific reference works have been
replaced by online curated datasets. Yet, given a dataset, there is no
quantitative, consistent and established way of knowing how it has been used
over time, who contributed to its curation, what results have been yielded or
what value it has.
The development of a theory and practice of data citation is fundamental for
considering data as first-class research objects with the same relevance and
centrality of traditional scientific products. Many works in recent years have
discussed data citation from different viewpoints: illustrating why data
citation is needed, defining the principles and outlining recommendations for
data citation systems, and providing computational methods for addressing
specific issues of data citation.
The current panorama is many-faceted and an overall view that brings together
diverse aspects of this topic is still missing. Therefore, this paper aims to
describe the lay of the land for data citation, both from the theoretical (the
why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association
for Information Science and Technology (JASIST), 201
The representation and management of evolving features in geospatial databases
Geographic features change over time, this change being the result of some
kind of event or occurrence. It has been a research challenge to represent
this data in a manner that reflects human perception. Most database systems used in geographic information systems (GIS) are relational, and change
is either captured by exhaustively storing all versions of data, or updates
replace previous versions. This stems from the inherent diffculty of modelling geographic objects in relational tables. This diffculty is compounded
when the necessary time dimension is introduced to model how those objects
evolve. There is little doubt that the object-oriented (OO) paradigm holds
signi cant advantages over the relational model when it comes to modelling
real-world entities and spatial data, and it is argued that this contention
is particularly true when it comes to spatio-temporal data. This thesis describes an object-oriented approach to the design of a conceptual model for representing spatio-temporal geographic data, called the Feature Evolution
Model (FEM), based on states and events. The model was used to implement a spatio-temporal database management system in Oracle Spatial, and
an interface prototype is described that was used to evaluate the system by
enabling querying and visualisation
An Approach to Conceptual Schema Evolution
In this work we will analyse conceptual foundations of user centric content management. Content management often involves integration of content that was created from different points of view. Current modeling techniques and especially current systems lack of a sufficient support of handling these situations. Although schema integration is undecideable in general, we will introduce a conceptual model together with a modeling and maintenance methodology that simplifies content integration in many practical situations. We will define a conceptual model based on the Higher-Order Entity Relationship Model that combines advantages of schema oriented modeling techniques like ER modeling with element driven paradims like approaches for semistructured data management. This model is ready to support contextual reasoning based on local model semantics. For the special case of schema evolution based on schema versioning we will derive the compatibility relation between local models by tracking dependencies of schema revisions. Additionally, we will discuss implementational facets, such as storage aspects for structurally flexible content or generation of adaptive user interfaces based on a conceptual interaction model
Extending ROOT through Modules
The ROOT software framework is foundational for the HEP ecosystem, providing
capabilities such as IO, a C++ interpreter, GUI, and math libraries. It uses
object-oriented concepts and build-time components to layer between them. We
believe additional layering formalisms will benefit ROOT and its users. We
present the modularization strategy for ROOT which aims to formalize the
description of existing source components, making available the dependencies
and other metadata externally from the build system, and allow post-install
additions of functionality in the runtime environment. components can then be
grouped into packages, installable from external repositories to deliver
post-install step of missing packages. This provides a mechanism for the wider
software ecosystem to interact with a minimalistic install. Reducing
intra-component dependencies improves maintainability and code hygiene. We
believe helping maintain the smallest "base install" possible will help
embedding use cases. The modularization effort draws inspiration from the Java,
Python, and Swift ecosystems. Keeping aligned with the modern C++, this
strategy relies on forthcoming features such as C++ modules. We hope
formalizing the component layer will provide simpler ROOT installs, improve
extensibility, and decrease the complexity of embedding in other ecosystemsComment: 8 pages, 2 figures, 1 listing, CHEP 2018 - 23rd International
Conference on Computing in High Energy and Nuclear Physic
Temporal data, temporal data models, temporal data languages and temporal database systems.
The study of temporal database systems is relatively new in the field of computer
science. Two developments have led to the present interest. The advances of the storage
technology for large amounts of data and applications' requirements for time-dependent
data have prompted our study of temporal databases. This thesis conducts a survey of the
major research areas concerning temporal databases. Temporal data, taxonomies of
temporal data models, temporal data languages, and temporal database systems are
presented. It is argued here that future database systems should handle the temporal
domain by an integrated temporal database system.
By understanding the present technology and the need of temporal database
systems, our research in the area of real-time temporal database systems can begin. It is
the purpose of this thesis to provide the background information and research references
of temporal database systems as a first step towards the real-time database system
research. Real-time database systems are time-constrained and temporally constituted.
Solutions in temporal database systems can contribute to the design of real-time military
applications using temporal database computers.http://archive.org/details/temporaldatatemp00homdCaptain, United States Marine CorpsApproved for public release; distribution is unlimited
- …