Search CORE

6,700 research outputs found

DataSpread: Unifying Databases and Spreadsheets.

Author: Aditya Parameswaran
Bofan Sun
Ding Zhang
Kevin Chang
Mangesh Bendre
Shy-yauer Lin
Xinyan Zhou
Publication venue: eScholarship, University of California
Publication date: 01/08/2015
Field of study

Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data exploration tool that holistically unifies databases and spreadsheets. It continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, PostgreSQL. DataSpread retains all the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spreadsheet. DataSpread needs to reason about and reconcile differences in the notions of schema, addressing of cells and tuples, and the current pane (which exists in spreadsheets but not in traditional databases), and support data modifications at both the front-end and the back-end. Our demonstration will center on our first and early prototype of the DataSpread, and will give the attendees a sense for the enormous data exploration capabilities offered by unifying spreadsheets and databases

CiteSeerX

Crossref

PubMed Central

eScholarship - University of California

SURGE: Continuous Detection of Bursty Regions Over a Stream of Spatial Objects

Author: Bhowmicks Sourav S.
Cong Gao
Feng Kaiyu
Guo Tao
Ma Shuai
Publication venue
Publication date: 28/09/2017
Field of study

With the proliferation of mobile devices and location-based services, continuous generation of massive volume of streaming spatial objects (i.e., geo-tagged data) opens up new opportunities to address real-world problems by analyzing them. In this paper, we present a novel continuous bursty region detection problem that aims to continuously detect a bursty region of a given size in a specified geographical area from a stream of spatial objects. Specifically, a bursty region shows maximum spike in the number of spatial objects in a given time window. The problem is useful in addressing several real-world challenges such as surge pricing problem in online transportation and disease outbreak detection. To solve the problem, we propose an exact solution and two approximate solutions, and the approximation ratio is

\frac{1-\alpha}{4}

in terms of the burst score, where

\alpha

is a parameter to control the burst score. We further extend these solutions to support detection of top-

k

bursty regions. Extensive experiments with real-world data are conducted to demonstrate the efficiency and effectiveness of our solutions

arXiv.org e-Print Archive

Crossref

DR-NTU (Digital Repository of NTU)

Infinite Probabilistic Databases

Author: Grohe Martin
Lindner Peter
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Database Theory (ICDT 2020)
Publication date: 01/01/2020
Field of study

Probabilistic databases (PDBs) are used to model uncertainty in data in a quantitative way. In the standard formal framework, PDBs are finite probability spaces over relational database instances. It has been argued convincingly that this is not compatible with an open-world semantics (Ceylan et al., KR 2016) and with application scenarios that are modeled by continuous probability distributions (Dalvi et al., CACM 2009). We recently introduced a model of PDBs as infinite probability spaces that addresses these issues (Grohe and Lindner, PODS 2019). While that work was mainly concerned with countably infinite probability spaces, our focus here is on uncountable spaces. Such an extension is necessary to model typical continuous probability distributions that appear in many applications. However, an extension beyond countable probability spaces raises nontrivial foundational issues concerned with the measurability of events and queries and ultimately with the question whether queries have a well-defined semantics. It turns out that so-called finite point processes are the appropriate model from probability theory for dealing with probabilistic databases. This model allows us to construct suitable (uncountable) probability spaces of database instances in a systematic way. Our main technical results are measurability statements for relational algebra queries as well as aggregate queries and Datalog queries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server