Search CORE

898 research outputs found

Fast and Reliable Missing Data Contingency Analysis with Predicate-Constraints

Author: Elmore Aaron J.
Franklin Michael J.
Krishnan Sanjay
Liang Xi
Shang Zechao
Publication venue
Publication date: 08/04/2020
Field of study

Today, data analysts largely rely on intuition to determine whether missing or withheld rows of a dataset significantly affect their analyses. We propose a framework that can produce automatic contingency analysis, i.e., the range of values an aggregate SQL query could take, under formal constraints describing the variation and frequency of missing data tuples. We describe how to process SUM, COUNT, AVG, MIN, and MAX queries in these conditions resulting in hard error bounds with testable constraints. We propose an optimization algorithm based on an integer program that reconciles a set of such constraints, even if they are overlapping, conflicting, or unsatisfiable, into such bounds. Our experiments on real-world datasets against several statistical imputation and inference baselines show that statistical techniques can have a deceptively high error rate that is often unpredictable. In contrast, our framework offers hard bounds that are guaranteed to hold if the constraints are not violated. In spite of these hard bounds, we show competitive accuracy to statistical baselines

arXiv.org e-Print Archive

Crossref

Efficient All Top-k Computation - A Unified Solution for All Top-k, Reverse Top-k and Top-m Influential Queries

Author: Cheung DWL
Ge S
Mamoulis N
U LH
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

published_or_final_versio

HKU Scholars Hub

Z2SAL: a translation-based model checker for Z

Author: Anthony J. H. Simons
C Bolton
D Jackson
D Plagge
I Toyn
J-R Abrial
JM Spivey
John Derrick
RE Bryant
RE Bryant
Siobhán North
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Despite being widely known and accepted in industry, the Z formal specification language has not so far been well supported by automated verification tools, mostly because of the challenges in handling the abstraction of the language. In this paper we discuss a novel approach to building a model-checker for Z, which involves implementing a translation from Z into SAL, the input language for the Symbolic Analysis Laboratory, a toolset which includes a number of model-checkers and a simulator. The Z2SAL translation deals with a number of important issues, including: mapping unbounded, abstract specifications into bounded, finite models amenable to a BDD-based symbolic checker; converting a non-constructive and piecemeal style of functional specification into a deterministic, automaton-based style of specification; and supporting the rich set-based vocabulary of the Z mathematical toolkit. This paper discusses progress made towards implementing as complete and faithful a translation as possible, while highlighting certain assumptions, respecting certain limitations and making use of available optimisations. The translation is illustrated throughout with examples; and a complete working example is presented, together with performance data

Crossref

White Rose Research Online

Incremental Processing and Optimization of Update Streams

Author: Liu Mengmeng
Publication venue: ScholarlyCommons
Publication date: 01/01/2016
Field of study

Over the recent years, we have seen an increasing number of applications in networking, sensor networks, cloud computing, and environmental monitoring, which monitor, plan, control, and make decisions over data streams from multiple sources. We are interested in extending traditional stream processing techniques to meet the new challenges of these applications. Generally, in order to support genuine continuous query optimization and processing over data streams, we need to systematically understand how to address incremental optimization and processing of update streams for a rich class of queries commonly used in the applications. Our general thesis is that efficient incremental processing and re-optimization of update streams can be achieved by various incremental view maintenance techniques if we cast the problems as incremental view maintenance problems over data streams. We focus on two incremental processing of update streams challenges currently not addressed in existing work on stream query processing: incremental processing of transitive closure queries over data streams, and incremental re-optimization of queries. In addition to addressing these specific challenges, we also develop a working prototype system Aspen, which serves as an end-to-end stream processing system that has been deployed as the foundation for a case study of our SmartCIS application. We validate our solutions both analytically and empirically on top of our prototype system Aspen, over a variety of benchmark workloads such as TPC-H and LinearRoad Benchmarks

ScholarlyCommons@Penn

Modeling, Annotating, and Querying Geo-Semantic Data Warehouses

Author: Gür Nurefsan
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2020
Field of study

VBN

Interactive Multidimensional Modeling of Linked Data for Exploratory OLAP

Author: Alberto Abelló
Enrico Gallinucci
Matteo Golfarelli
Oscar Romero
Stefano Rizzi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Exploratory OLAP aims at coupling the precision and detail of corporate data with the information wealth of LOD. While some techniques to create, publish, and query RDF cubes are already available, little has been said about how to contextualize these cubes with situational data in an on-demand fashion. In this paper we describe an approach, called iMOLD, that enables non-technical users to enrich an RDF cube with multidimensional knowledge by discovering aggregation hierarchies in LOD. This is done through a user-guided process that recognizes in the LOD the recurring modeling patterns that express roll- up relationships between RDF concepts, then translates these patterns into aggregation hierarchies to enrich the RDF cube. Two families of aggregation patterns are identified, based on associations and generalization respectively, and the algorithms for recognizing them are described. To evaluate iMOLD in terms of efficiency and effectiveness we compare it with a related approach in the literature, we propose a case study based on DBpedia, and we discuss the results of a test made with real users

Crossref

ZENODO

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY