16,435 research outputs found

    A Survey on Array Storage, Query Languages, and Systems

    Full text link
    Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

    Automated Quantitative Description of Spiral Galaxy Arm-Segment Structure

    Full text link
    We describe a system for the automatic quantification of structure in spiral galaxies. This enables translation of sky survey images into data needed to help address fundamental astrophysical questions such as the origin of spiral structure---a phenomenon that has eluded theoretical description despite 150 years of study (Sellwood 2010). The difficulty of automated measurement is underscored by the fact that, to date, only manual efforts (such as the citizen science project Galaxy Zoo) have been able to extract information about large samples of spiral galaxies. An automated approach will be needed to eliminate measurement subjectivity and handle the otherwise-overwhelming image quantities (up to billions of images) from near-future surveys. Our approach automatically describes spiral galaxy structure as a set of arcs, precisely describing spiral arm segment arrangement while retaining the flexibility needed to accommodate the observed wide variety of spiral galaxy structure. The largest existing quantitative measurements were manually-guided and encompassed fewer than 100 galaxies, while we have already applied our method to more than 29,000 galaxies. Our output matches previous information, both quantitatively over small existing samples, and qualitatively against human classifications from Galaxy Zoo.Comment: 9 pages;4 figures; 2 tables; accepted to CVPR (Computer Vision and Pattern Recognition), June 2012, Providence, Rhode Island, June 16-21, 201

    Formal Representation of the SS-DB Benchmark and Experimental Evaluation in EXTASCID

    Full text link
    Evaluating the performance of scientific data processing systems is a difficult task considering the plethora of application-specific solutions available in this landscape and the lack of a generally-accepted benchmark. The dual structure of scientific data coupled with the complex nature of processing complicate the evaluation procedure further. SS-DB is the first attempt to define a general benchmark for complex scientific processing over raw and derived data. It fails to draw sufficient attention though because of the ambiguous plain language specification and the extraordinary SciDB results. In this paper, we remedy the shortcomings of the original SS-DB specification by providing a formal representation in terms of ArrayQL algebra operators and ArrayQL/SciQL constructs. These are the first formal representations of the SS-DB benchmark. Starting from the formal representation, we give a reference implementation and present benchmark results in EXTASCID, a novel system for scientific data processing. EXTASCID is complete in providing native support both for array and relational data and extensible in executing any user code inside the system by the means of a configurable metaoperator. These features result in an order of magnitude improvement over SciDB at data loading, extracting derived data, and operations over derived data.Comment: 32 pages, 3 figure
    • …
    corecore