6,566 research outputs found
Searchable Sky Coverage of Astronomical Observations: Footprints and Exposures
Sky coverage is one of the most important pieces of information about
astronomical observations. We discuss possible representations, and present
algorithms to create and manipulate shapes consisting of generalized spherical
polygons with arbitrary complexity and size on the celestial sphere. This shape
specification integrates well with our Hierarchical Triangular Mesh indexing
toolbox, whose performance and capabilities are enhanced by the advanced
features presented here. Our portable implementation of the relevant spherical
geometry routines comes with wrapper functions for database queries, which are
currently being used within several scientific catalog archives including the
Sloan Digital Sky Survey, the Galaxy Evolution Explorer and the Hubble Legacy
Archive projects as well as the Footprint Service of the Virtual Observatory.Comment: 11 pages, 7 figures, submitted to PAS
A Multi-Code Analysis Toolkit for Astrophysical Simulation Data
The analysis of complex multiphysics astrophysical simulations presents a
unique and rapidly growing set of challenges: reproducibility, parallelization,
and vast increases in data size and complexity chief among them. In order to
meet these challenges, and in order to open up new avenues for collaboration
between users of multiple simulation platforms, we present yt (available at
http://yt.enzotools.org/), an open source, community-developed astrophysical
analysis and visualization toolkit. Analysis and visualization with yt are
oriented around physically relevant quantities rather than quantities native to
astrophysical simulation codes. While originally designed for handling Enzo's
structure adaptive mesh refinement (AMR) data, yt has been extended to work
with several different simulation methods and simulation codes including Orion,
RAMSES, and FLASH. We report on its methods for reading, handling, and
visualizing data, including projections, multivariate volume rendering,
multi-dimensional histograms, halo finding, light cone generation and
topologically-connected isocontour identification. Furthermore, we discuss the
underlying algorithms yt uses for processing and visualizing data, and its
mechanisms for parallelization of analysis tasks.Comment: 18 pages, 6 figures, emulateapj format. Resubmitted to Astrophysical
Journal Supplement Series with revisions from referee. yt can be found at
http://yt.enzotools.org
Learning Models over Relational Data using Sparse Tensors and Functional Dependencies
Integrated solutions for analytics over relational databases are of great
practical importance as they avoid the costly repeated loop data scientists
have to deal with on a daily basis: select features from data residing in
relational databases using feature extraction queries involving joins,
projections, and aggregations; export the training dataset defined by such
queries; convert this dataset into the format of an external learning tool; and
train the desired model using this tool. These integrated solutions are also a
fertile ground of theoretically fundamental and challenging problems at the
intersection of relational and statistical data models.
This article introduces a unified framework for training and evaluating a
class of statistical learning models over relational databases. This class
includes ridge linear regression, polynomial regression, factorization
machines, and principal component analysis. We show that, by synergizing key
tools from database theory such as schema information, query structure,
functional dependencies, recent advances in query evaluation algorithms, and
from linear algebra such as tensor and matrix operations, one can formulate
relational analytics problems and design efficient (query and data)
structure-aware algorithms to solve them.
This theoretical development informed the design and implementation of the
AC/DC system for structure-aware learning. We benchmark the performance of
AC/DC against R, MADlib, libFM, and TensorFlow. For typical retail forecasting
and advertisement planning applications, AC/DC can learn polynomial regression
models and factorization machines with at least the same accuracy as its
competitors and up to three orders of magnitude faster than its competitors
whenever they do not run out of memory, exceed 24-hour timeout, or encounter
internal design limitations.Comment: 61 pages, 9 figures, 2 table
Compressed Representations of Conjunctive Query Results
Relational queries, and in particular join queries, often generate large
output results when executed over a huge dataset. In such cases, it is often
infeasible to store the whole materialized output if we plan to reuse it
further down a data processing pipeline. Motivated by this problem, we study
the construction of space-efficient compressed representations of the output of
conjunctive queries, with the goal of supporting the efficient access of the
intermediate compressed result for a given access pattern. In particular, we
initiate the study of an important tradeoff: minimizing the space necessary to
store the compressed result, versus minimizing the answer time and delay for an
access request over the result. Our main contribution is a novel parameterized
data structure, which can be tuned to trade off space for answer time. The
tradeoff allows us to control the space requirement of the data structure
precisely, and depends both on the structure of the query and the access
pattern. We show how we can use the data structure in conjunction with query
decomposition techniques, in order to efficiently represent the outputs for
several classes of conjunctive queries.Comment: To appear in PODS'18; 35 pages; comments welcom
- …