Search CORE

6,511 research outputs found

DataSpread: Unifying Databases and Spreadsheets.

Author: Aditya Parameswaran
Bofan Sun
Ding Zhang
Kevin Chang
Mangesh Bendre
Shy-yauer Lin
Xinyan Zhou
Publication venue: eScholarship, University of California
Publication date: 01/08/2015
Field of study

Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data exploration tool that holistically unifies databases and spreadsheets. It continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, PostgreSQL. DataSpread retains all the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spreadsheet. DataSpread needs to reason about and reconcile differences in the notions of schema, addressing of cells and tuples, and the current pane (which exists in spreadsheets but not in traditional databases), and support data modifications at both the front-end and the back-end. Our demonstration will center on our first and early prototype of the DataSpread, and will give the attendees a sense for the enormous data exploration capabilities offered by unifying spreadsheets and databases

CiteSeerX

PubMed Central

eScholarship - University of California

Enabling On-Demand Database Computing with MIT SuperCloud Database Management System

Author: Arcand William
Bergeron Bill
Bestor David
Byun Chansup
Edwards Lauren
Gadepally Vijay
Hubbell Matthew
Kepner Jeremy
Michaleas Peter
Mullen Julie
Prout Andrew
Reuther Albert
Rosa Antonio
Yee Charles
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/06/2015
Field of study

The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB. It is designed to permit these databases to run on a High Performance Computing Cluster (HPCC) platform as seamlessly as any other HPCC job. It ensures the seamless migration of the databases to the resources assigned by the HPCC scheduler and centralized storage of the database files when not running. It also permits snapshotting of databases to allow researchers to experiment and push the limits of the technology without concerns for data or productivity loss if the database becomes unstable.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing (HPEC) conference 2015. arXiv admin note: text overlap with arXiv:1406.492

arXiv.org e-Print Archive

Crossref

SURGE: Continuous Detection of Bursty Regions Over a Stream of Spatial Objects

Author: Bhowmicks Sourav S.
Cong Gao
Feng Kaiyu
Guo Tao
Ma Shuai
Publication venue
Publication date: 28/09/2017
Field of study

With the proliferation of mobile devices and location-based services, continuous generation of massive volume of streaming spatial objects (i.e., geo-tagged data) opens up new opportunities to address real-world problems by analyzing them. In this paper, we present a novel continuous bursty region detection problem that aims to continuously detect a bursty region of a given size in a specified geographical area from a stream of spatial objects. Specifically, a bursty region shows maximum spike in the number of spatial objects in a given time window. The problem is useful in addressing several real-world challenges such as surge pricing problem in online transportation and disease outbreak detection. To solve the problem, we propose an exact solution and two approximate solutions, and the approximation ratio is

\frac{1-\alpha}{4}

in terms of the burst score, where

\alpha

is a parameter to control the burst score. We further extend these solutions to support detection of top-

k

bursty regions. Extensive experiments with real-world data are conducted to demonstrate the efficiency and effectiveness of our solutions

arXiv.org e-Print Archive

Crossref

DR-NTU (Digital Repository of NTU)