Search CORE

52,500 research outputs found

Towards Large-Scale Knowledge Discovery in Databases (KDD) by Exploiting Parallelism in Generic KDD Primitives

Author: Freitas Alex A.
Publication venue
Publication date: 01/07/1997
Field of study

Matching Dependencies with Arbitrary Attribute Values: Semantics, Query Answering and Integrity Constraints

Author: Bertossi Leopoldo
Gardezi Jaffer
Kiringa Iluju
Publication venue
Publication date: 26/08/2010
Field of study

Matching dependencies (MDs) were introduced to specify the identification or matching of certain attribute values in pairs of database tuples when some similarity conditions are satisfied. Their enforcement can be seen as a natural generalization of entity resolution. In what we call the "pure case" of MDs, any value from the underlying data domain can be used for the value in common that does the matching. We investigate the semantics and properties of data cleaning through the enforcement of matching dependencies for the pure case. We characterize the intended clean instances and also the "clean answers" to queries as those that are invariant under the cleaning process. The complexity of computing clean instances and clean answers to queries is investigated. Tractable and intractable cases depending on the MDs and queries are identified. Finally, we establish connections with database "repairs" under integrity constraints.Comment: 13 pages, double column, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Carleton University's Institutional Repository

Access Control Synthesis for Physical Spaces

Author: Basin David
Dashti Mohammad Torabi
Tsankov Petar
Publication venue
Publication date: 01/01/2016
Field of study

Access-control requirements for physical spaces, like office buildings and airports, are best formulated from a global viewpoint in terms of system-wide requirements. For example, "there is an authorized path to exit the building from every room." In contrast, individual access-control components, such as doors and turnstiles, can only enforce local policies, specifying when the component may open. In practice, the gap between the system-wide, global requirements and the many local policies is bridged manually, which is tedious, error-prone, and scales poorly. We propose a framework to automatically synthesize local access control policies from a set of global requirements for physical spaces. Our framework consists of an expressive language to specify both global requirements and physical spaces, and an algorithm for synthesizing local, attribute-based policies from the global specification. We empirically demonstrate the framework's effectiveness on three substantial case studies. The studies demonstrate that access control synthesis is practical even for complex physical spaces, such as airports, with many interrelated security requirements

arXiv.org e-Print Archive

Repository for Publications and Research Data

A System for Induction of Oblique Decision Trees

Author: Kasif S.
Murthy S. K.
Salzberg S.
Publication venue
Publication date: 01/01/1994
Field of study

This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

arXiv.org e-Print Archive

CiteSeerX

Queries with Guarded Negation (full version)

Author: Barany Vince
Cate Balder ten
Otto Martin
Publication venue
Publication date: 01/01/2012
Field of study

A well-established and fundamental insight in database theory is that negation (also known as complementation) tends to make queries difficult to process and difficult to reason about. Many basic problems are decidable and admit practical algorithms in the case of unions of conjunctive queries, but become difficult or even undecidable when queries are allowed to contain negation. Inspired by recent results in finite model theory, we consider a restricted form of negation, guarded negation. We introduce a fragment of SQL, called GN-SQL, as well as a fragment of Datalog with stratified negation, called GN-Datalog, that allow only guarded negation, and we show that these query languages are computationally well behaved, in terms of testing query containment, query evaluation, open-world query answering, and boundedness. GN-SQL and GN-Datalog subsume a number of well known query languages and constraint languages, such as unions of conjunctive queries, monadic Datalog, and frontier-guarded tgds. In addition, an analysis of standard benchmark workloads shows that most usage of negation in SQL in practice is guarded negation

arXiv.org e-Print Archive

CiteSeerX

Boosting Applied to Word Sense Disambiguation

Author: Escudero Gerard
Marquez Lluis
Rigau German
Publication venue
Publication date: 01/01/2000
Field of study

In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show that the boosting approach surpasses Naive Bayes and Exemplar-based approaches, which represent state-of-the-art accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are studied. The best variant, which we call LazyBoosting, is tested on the largest sense-tagged corpus available containing 192,800 examples of the 191 most frequent and ambiguous English words. Again, boosting compares favourably to the other benchmark algorithms.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC