16,435 research outputs found
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
Automated Quantitative Description of Spiral Galaxy Arm-Segment Structure
We describe a system for the automatic quantification of structure in spiral
galaxies. This enables translation of sky survey images into data needed to
help address fundamental astrophysical questions such as the origin of spiral
structure---a phenomenon that has eluded theoretical description despite 150
years of study (Sellwood 2010). The difficulty of automated measurement is
underscored by the fact that, to date, only manual efforts (such as the citizen
science project Galaxy Zoo) have been able to extract information about large
samples of spiral galaxies. An automated approach will be needed to eliminate
measurement subjectivity and handle the otherwise-overwhelming image quantities
(up to billions of images) from near-future surveys. Our approach automatically
describes spiral galaxy structure as a set of arcs, precisely describing spiral
arm segment arrangement while retaining the flexibility needed to accommodate
the observed wide variety of spiral galaxy structure. The largest existing
quantitative measurements were manually-guided and encompassed fewer than 100
galaxies, while we have already applied our method to more than 29,000
galaxies. Our output matches previous information, both quantitatively over
small existing samples, and qualitatively against human classifications from
Galaxy Zoo.Comment: 9 pages;4 figures; 2 tables; accepted to CVPR (Computer Vision and
Pattern Recognition), June 2012, Providence, Rhode Island, June 16-21, 201
Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID
Evaluating the performance of scientific data processing systems is a
difficult task considering the plethora of application-specific solutions
available in this landscape and the lack of a generally-accepted benchmark. The
dual structure of scientific data coupled with the complex nature of processing
complicate the evaluation procedure further. SS-DB is the first attempt to
define a general benchmark for complex scientific processing over raw and
derived data. It fails to draw sufficient attention though because of the
ambiguous plain language specification and the extraordinary SciDB results. In
this paper, we remedy the shortcomings of the original SS-DB specification by
providing a formal representation in terms of ArrayQL algebra operators and
ArrayQL/SciQL constructs. These are the first formal representations of the
SS-DB benchmark. Starting from the formal representation, we give a reference
implementation and present benchmark results in EXTASCID, a novel system for
scientific data processing. EXTASCID is complete in providing native support
both for array and relational data and extensible in executing any user code
inside the system by the means of a configurable metaoperator. These features
result in an order of magnitude improvement over SciDB at data loading,
extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure
- …