1,752 research outputs found
Join Execution Using Fragmented Columnar Indices on GPU and MIC
The paper describes an approach to the parallel natural join execution on computing clusters with GPU and MIC Coprocessors. This approach is based on a decomposition of natural join relational operator using the column indices and domain-interval fragmentation. This decomposition admits parallel executing the resource-intensive relational operators without data transfers. All column index fragments are stored in main memory. To process the join of two relations, each pair of index fragments corresponding to particular domain interval is joined on a separate processor core. Described approach allows efficient parallel query processing for very large databases on modern computing cluster systems with many-core accelerators. A prototype of the DBMS coprocessor system was implemented using this technique. The results of computational experiments for GPU and Xeon Phi are presented. These results confirm the efficiency of proposed approach
Finite Boolean Algebras for Solid Geometry using Julia's Sparse Arrays
The goal of this paper is to introduce a new method in computer-aided
geometry of solid modeling. We put forth a novel algebraic technique to
evaluate any variadic expression between polyhedral d-solids (d = 2, 3) with
regularized operators of union, intersection, and difference, i.e., any CSG
tree. The result is obtained in three steps: first, by computing an independent
set of generators for the d-space partition induced by the input; then, by
reducing the solid expression to an equivalent logical formula between Boolean
terms made by zeros and ones; and, finally, by evaluating this expression using
bitwise operators. This method is implemented in Julia using sparse arrays. The
computational evaluation of every possible solid expression, usually denoted as
CSG (Constructive Solid Geometry), is reduced to an equivalent logical
expression of a finite set algebra over the cells of a space partition, and
solved by native bitwise operators.Comment: revised version submitted to Computer-Aided Geometric Desig
A Multi-Code Analysis Toolkit for Astrophysical Simulation Data
The analysis of complex multiphysics astrophysical simulations presents a
unique and rapidly growing set of challenges: reproducibility, parallelization,
and vast increases in data size and complexity chief among them. In order to
meet these challenges, and in order to open up new avenues for collaboration
between users of multiple simulation platforms, we present yt (available at
http://yt.enzotools.org/), an open source, community-developed astrophysical
analysis and visualization toolkit. Analysis and visualization with yt are
oriented around physically relevant quantities rather than quantities native to
astrophysical simulation codes. While originally designed for handling Enzo's
structure adaptive mesh refinement (AMR) data, yt has been extended to work
with several different simulation methods and simulation codes including Orion,
RAMSES, and FLASH. We report on its methods for reading, handling, and
visualizing data, including projections, multivariate volume rendering,
multi-dimensional histograms, halo finding, light cone generation and
topologically-connected isocontour identification. Furthermore, we discuss the
underlying algorithms yt uses for processing and visualizing data, and its
mechanisms for parallelization of analysis tasks.Comment: 18 pages, 6 figures, emulateapj format. Resubmitted to Astrophysical
Journal Supplement Series with revisions from referee. yt can be found at
http://yt.enzotools.org
Query processing in temporal object-oriented databases
This PhD thesis is concerned with historical data management in the context of objectoriented
databases. An extensible approach has been explored to processing temporal object queries within a uniform query framework. By the uniform framework, we mean
temporal queries can be processed within the existing object-oriented framework that is extended from relational framework, by extending the existing query processing
techniques and strategies developed for OODBs and RDBs.
The unified model of OODBs and RDBs in UmSQL/X has been adopted as a basis for this purpose. A temporal object data model is thereby defined by incorporating a time
dimension into this unified model of OODBs and RDBs to form temporal relational-like cubes but with the addition of aggregation and inheritance hierarchies. A query algebra,
that accesses objects through these associations of aggregation, inheritance and timereference, is then defined as a general query model /language. Due to the extensive
features of our data model and reducibility of the algebra, a layered structure of query processor is presented that provides a uniforrn framework for processing temporal object
queries. Within the uniform framework, query transformation is carried out based on a set of transformation rules identified that includes the known relational and object rules plus those pertaining to the time dimension. To evaluate a temporal query involving a path with timereference, a strategy of decomposition is proposed. That is, evaluation of an enhanced path, which is defined to extend a path with time-reference, is decomposed by initially dividing the path into two sub-paths: one containing the time-stamped class that can be optimized by
making use of the ordering information of temporal data and another an ordinary sub-path (without time-stamped classes) which can be further decomposed and evaluated using
different algorithms. The intermediate results of traversing the two sub-paths are then joined together to create the query output. Algorithms for processing the decomposed query components, i. e., time-related operation algorithms, four join algorithms (nested-loop forward join, sort-merge forward join, nested-loop reverse join and sort-merge reverse join) and their modifications, have been presented with cost analysis and implemented with stream processing techniques using C++. Simulation results are also provided. Both cost analysis and simulation show the effects of time on the query processing algorithms: the join time cost is linearly increased with the expansion in the number of time-epochs (time-dimension in the case of a regular TS). It is also shown that using heuristics that make use of time information can lead to a significant time cost saving. Query processing with incomplete temporal data has also been discussed
A Survey on Mapping Semi-Structured Data and Graph Data to Relational Data
The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today's market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases' capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from eachmodel's mapping strategies, aswell as a newresearch topic - mapping multi-model data into relational tables.Peer reviewe
Recommended from our members
End-to-end deep reinforcement learning in computer systems
Abstract
The growing complexity of data processing systems has long led systems designers to imagine systems (e.g. databases, schedulers) which can self-configure and adapt based on environmental cues. In this context, reinforcement learning (RL) methods have since their inception appealed to systems developers. They promise to acquire complex decision policies from raw feedback signals. Despite their conceptual popularity, RL methods are scarcely found in real-world data processing systems. Recently, RL has seen explosive growth in interest due to high profile successes when utilising large neural networks (deep reinforcement learning). Newly emerging machine learning frameworks and powerful hardware accelerators have given rise to a plethora of new potential applications.
In this dissertation, I first argue that in order to design and execute deep RL algorithms efficiently, novel software abstractions are required which can accommodate the distinct computational patterns of communication-intensive and fast-evolving algorithms. I propose an architecture which decouples logical algorithm construction from local and distributed execution semantics. I further present RLgraph, my proof-of-concept implementation of this architecture. In RLgraph, algorithm developers can explore novel designs by constructing a high-level data flow graph through combination of logical components. This dataflow graph is independent of specific backend frameworks or notions of execution, and is only later mapped to execution semantics via a staged build process. RLgraph enables high-performing algorithm implementations while maintaining flexibility for rapid prototyping.
Second, I investigate reasons for the scarcity of RL applications in systems themselves. I argue that progress in applied RL is hindered by a lack of tools for task model design which bridge the gap between systems and algorithms, and also by missing shared standards for evaluation of model capabilities. I introduce Wield, a first-of-its-kind tool for incremental model design in applied RL. Wield provides a small set of primitives which decouple systems interfaces and deployment-specific configuration from representation. Core to Wield is a novel instructive experiment protocol called progressive randomisation which helps practitioners to incrementally evaluate different dimensions of non-determinism. I demonstrate how Wield and progressive randomisation can be used to reproduce and assess prior work, and to guide implementation of novel RL applications
- …