46,927 research outputs found
Incremental View Maintenance For Collection Programming
In the context of incremental view maintenance (IVM), delta query derivation
is an essential technique for speeding up the processing of large, dynamic
datasets. The goal is to generate delta queries that, given a small change in
the input, can update the materialized view more efficiently than via
recomputation. In this work we propose the first solution for the efficient
incrementalization of positive nested relational calculus (NRC+) on bags (with
integer multiplicities). More precisely, we model the cost of NRC+ operators
and classify queries as efficiently incrementalizable if their delta has a
strictly lower cost than full re-evaluation. Then, we identify IncNRC+; a large
fragment of NRC+ that is efficiently incrementalizable and we provide a
semantics-preserving translation that takes any NRC+ query to a collection of
IncNRC+ queries. Furthermore, we prove that incremental maintenance for NRC+ is
within the complexity class NC0 and we showcase how recursive IVM, a technique
that has provided significant speedups over traditional IVM in the case of flat
queries [25], can also be applied to IncNRC+.Comment: 24 pages (12 pages plus appendix
Sets and indices in linear programming modelling and their integration with relational data models
LP models are usually constructed using index sets and data tables which are closely related to the attributes and relations of relational database (RDB) systems. We extend the syntax of MPL, an existing LP modelling language, in order to connect it to a given RDB system. This approach reuses existing modelling and database software, provides a rich modelling environment and achieves model and data independence. This integrated software enables Mathematical Programming to be widely used as a decision support tool by unlocking the data residing in corporate databases
Advanced Probabilistic Couplings for Differential Privacy
Differential privacy is a promising formal approach to data privacy, which
provides a quantitative bound on the privacy cost of an algorithm that operates
on sensitive information. Several tools have been developed for the formal
verification of differentially private algorithms, including program logics and
type systems. However, these tools do not capture fundamental techniques that
have emerged in recent years, and cannot be used for reasoning about
cutting-edge differentially private algorithms. Existing techniques fail to
handle three broad classes of algorithms: 1) algorithms where privacy depends
accuracy guarantees, 2) algorithms that are analyzed with the advanced
composition theorem, which shows slower growth in the privacy cost, 3)
algorithms that interactively accept adaptive inputs.
We address these limitations with a new formalism extending apRHL, a
relational program logic that has been used for proving differential privacy of
non-interactive algorithms, and incorporating aHL, a (non-relational) program
logic for accuracy properties. We illustrate our approach through a single
running example, which exemplifies the three classes of algorithms and explores
new variants of the Sparse Vector technique, a well-studied algorithm from the
privacy literature. We implement our logic in EasyCrypt, and formally verify
privacy. We also introduce a novel coupling technique called \emph{optimal
subset coupling} that may be of independent interest
kLog: A Language for Logical and Relational Learning with Kernels
We introduce kLog, a novel approach to statistical relational learning.
Unlike standard approaches, kLog does not represent a probability distribution
directly. It is rather a language to perform kernel-based learning on
expressive logical and relational representations. kLog allows users to specify
learning problems declaratively. It builds on simple but powerful concepts:
learning from interpretations, entity/relationship data modeling, logic
programming, and deductive databases. Access by the kernel to the rich
representation is mediated by a technique we call graphicalization: the
relational representation is first transformed into a graph --- in particular,
a grounded entity/relationship diagram. Subsequently, a choice of graph kernel
defines the feature space. kLog supports mixed numerical and symbolic data, as
well as background knowledge in the form of Prolog or Datalog programs as in
inductive logic programming systems. The kLog framework can be applied to
tackle the same range of tasks that has made statistical relational learning so
popular, including classification, regression, multitask learning, and
collective classification. We also report about empirical comparisons, showing
that kLog can be either more accurate, or much faster at the same level of
accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at
http://klog.dinfo.unifi.it along with tutorials
Computing Multi-Relational Sufficient Statistics for Large Databases
Databases contain information about which relationships do and do not hold
among entities. To make this information accessible for statistical analysis
requires computing sufficient statistics that combine information from
different database tables. Such statistics may involve any number of {\em
positive and negative} relationships. With a naive enumeration approach,
computing sufficient statistics for negative relationships is feasible only for
small databases. We solve this problem with a new dynamic programming algorithm
that performs a virtual join, where the requisite counts are computed without
materializing join tables. Contingency table algebra is a new extension of
relational algebra, that facilitates the efficient implementation of this
M\"obius virtual join operation. The M\"obius Join scales to large datasets
(over 1M tuples) with complex schemas. Empirical evaluation with seven
benchmark datasets showed that information about the presence and absence of
links can be exploited in feature selection, association rule mining, and
Bayesian network learning.Comment: 11pages, 8 figures, 8 tables, CIKM'14,November 3--7, 2014, Shanghai,
Chin
A Rule-Based Approach to Analyzing Database Schema Objects with Datalog
Database schema elements such as tables, views, triggers and functions are
typically defined with many interrelationships. In order to support database
users in understanding a given schema, a rule-based approach for analyzing the
respective dependencies is proposed using Datalog expressions. We show that
many interesting properties of schema elements can be systematically determined
this way. The expressiveness of the proposed analysis is exemplarily shown with
the problem of computing induced functional dependencies for derived relations.
The propagation of functional dependencies plays an important role in data
integration and query optimization but represents an undecidable problem in
general. And yet, our rule-based analysis covers all relational operators as
well as linear recursive expressions in a systematic way showing the depth of
analysis possible by our proposal. The analysis of functional dependencies is
well-integrated in a uniform approach to analyzing dependencies between schema
elements in general.Comment: Pre-proceedings paper presented at the 27th International Symposium
on Logic-Based Program Synthesis and Transformation (LOPSTR 2017), Namur,
Belgium, 10-12 October 2017 (arXiv:1708.07854
Answer Set Programming Modulo `Space-Time'
We present ASP Modulo `Space-Time', a declarative representational and
computational framework to perform commonsense reasoning about regions with
both spatial and temporal components. Supported are capabilities for mixed
qualitative-quantitative reasoning, consistency checking, and inferring
compositions of space-time relations; these capabilities combine and synergise
for applications in a range of AI application areas where the processing and
interpretation of spatio-temporal data is crucial. The framework and resulting
system is the only general KR-based method for declaratively reasoning about
the dynamics of `space-time' regions as first-class objects. We present an
empirical evaluation (with scalability and robustness results), and include
diverse application examples involving interpretation and control tasks
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
- …