52,500 research outputs found
Matching Dependencies with Arbitrary Attribute Values: Semantics, Query Answering and Integrity Constraints
Matching dependencies (MDs) were introduced to specify the identification or
matching of certain attribute values in pairs of database tuples when some
similarity conditions are satisfied. Their enforcement can be seen as a natural
generalization of entity resolution. In what we call the "pure case" of MDs,
any value from the underlying data domain can be used for the value in common
that does the matching. We investigate the semantics and properties of data
cleaning through the enforcement of matching dependencies for the pure case. We
characterize the intended clean instances and also the "clean answers" to
queries as those that are invariant under the cleaning process. The complexity
of computing clean instances and clean answers to queries is investigated.
Tractable and intractable cases depending on the MDs and queries are
identified. Finally, we establish connections with database "repairs" under
integrity constraints.Comment: 13 pages, double column, 2 figure
Access Control Synthesis for Physical Spaces
Access-control requirements for physical spaces, like office buildings and
airports, are best formulated from a global viewpoint in terms of system-wide
requirements. For example, "there is an authorized path to exit the building
from every room." In contrast, individual access-control components, such as
doors and turnstiles, can only enforce local policies, specifying when the
component may open. In practice, the gap between the system-wide, global
requirements and the many local policies is bridged manually, which is tedious,
error-prone, and scales poorly.
We propose a framework to automatically synthesize local access control
policies from a set of global requirements for physical spaces. Our framework
consists of an expressive language to specify both global requirements and
physical spaces, and an algorithm for synthesizing local, attribute-based
policies from the global specification. We empirically demonstrate the
framework's effectiveness on three substantial case studies. The studies
demonstrate that access control synthesis is practical even for complex
physical spaces, such as airports, with many interrelated security
requirements
A System for Induction of Oblique Decision Trees
This article describes a new system for induction of oblique decision trees.
This system, OC1, combines deterministic hill-climbing with two forms of
randomization to find a good oblique split (in the form of a hyperplane) at
each node of a decision tree. Oblique decision tree methods are tuned
especially for domains in which the attributes are numeric, although they can
be adapted to symbolic or mixed symbolic/numeric attributes. We present
extensive empirical studies, using both real and artificial data, that analyze
OC1's ability to construct oblique trees that are smaller and more accurate
than their axis-parallel counterparts. We also examine the benefits of
randomization for the construction of oblique decision trees.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
Queries with Guarded Negation (full version)
A well-established and fundamental insight in database theory is that
negation (also known as complementation) tends to make queries difficult to
process and difficult to reason about. Many basic problems are decidable and
admit practical algorithms in the case of unions of conjunctive queries, but
become difficult or even undecidable when queries are allowed to contain
negation. Inspired by recent results in finite model theory, we consider a
restricted form of negation, guarded negation. We introduce a fragment of SQL,
called GN-SQL, as well as a fragment of Datalog with stratified negation,
called GN-Datalog, that allow only guarded negation, and we show that these
query languages are computationally well behaved, in terms of testing query
containment, query evaluation, open-world query answering, and boundedness.
GN-SQL and GN-Datalog subsume a number of well known query languages and
constraint languages, such as unions of conjunctive queries, monadic Datalog,
and frontier-guarded tgds. In addition, an analysis of standard benchmark
workloads shows that most usage of negation in SQL in practice is guarded
negation
Boosting Applied to Word Sense Disambiguation
In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied
to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of
15 selected polysemous words show that the boosting approach surpasses Naive
Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy
on supervised WSD. In order to make boosting practical for a real learning
domain of thousands of words, several ways of accelerating the algorithm by
reducing the feature space are studied. The best variant, which we call
LazyBoosting, is tested on the largest sense-tagged corpus available containing
192,800 examples of the 191 most frequent and ambiguous English words. Again,
boosting compares favourably to the other benchmark algorithms.Comment: 12 page
- …