77 research outputs found

    Analyses and Validation of Conditional Dependencies with Built-in Predicates

    Get PDF
    This paper proposes a natural extension of conditional functional dependencies (CFDS [14]) and conditional inclusion dependencies (CINDS [8]), denoted by CFD(p)s and CIND(p)s, respectively, by specifying patterns of data, values with not equal, <, <=, > and >= predicates. As data quality rules, CFD(p)s and CIND(p)s are able to capture errors that commonly arise in practice but cannot, be detected by CFDS and CINDS. We establish two sets of results for central technical problems associated with CFD(p)s and CIND(p)s. (a) One concerns the satisfiability and implication problems for CFD(p)s and CIND(p)s, taken separately or together. These are important for, e.g., deciding whether data, quality rules are dirty themselves, and for removing redundant rules. We show that despite the increased expressive power, the static analyses of CFD(p)s and CIND(p)s retain the same complexity as their CFDs and CINDs counterparts. (b) The other concerns validation of CFD(p)s and CIND(p)s. We show that given a set Sigma of CFD(p)s and CIND(p)s on a database D, a, set of SQL queries can be automatically generated that, when evaluated against D, return all tuples in D that violate some dependencies in Sigma. This provides commercial DBMS with an immediate capability to detect errors based on CFD(p)s and CIND(p)s.Computer Science, Information SystemsComputer Science, Theory & MethodsEICPCI-S(ISTP)

    Design of a web-based LBS framework addressing usability, cost, and implementation constraints

    Get PDF
    This research investigates barriers that prevent Location Based Services (LBS) from reaching its full potential. The different constraints, including poor usability, lack of positioning support, costs, and integration difficulties are highlighted. A framework was designed incorporating components based on existing and new technologies that could help address the constraints of LBS and increase end-user acceptance. This research proposes that usability constraints can be addressed by adapting a system to user characteristics which are inferred on the basis of captured user context and interaction data. A prototype LBS system was developed to prove the feasibility and benefit of the framework design, demonstrating that constraints of positioning, cost, and integration can be overcome. Volunteers were asked to use the system, and to answer questions in relation to their proficiency and experience. User-feedback showed that the proposed combination of functionality was well-received, and the prototype was appealing to many users. Ground-truths from the survey were related back to data captured with a user monitoring component in order to investigate whether users can be classified according to their context and how they interact. The results have shown that statistically significant relationships exist, and that by using the C4.5 decision-tree, computer proficiency can be estimated within one class-width in 76.7% of the cases. These results suggest that it may be possible to build a user-model to estimate computer proficiency on the basis of user-interaction data. The user model could then used to improve usability through adaptive user-specific customisations

    Conditional Dependencies: A Principled Approach to Improving Data Quality

    Get PDF
    Abstract. Real-life date is often dirty and costs billions of pounds to businesses worldwide each year. This paper presents a promising ap-proach to improving data quality. It effectively detects and fixes inconsis-tencies in real-life data based on conditional dependencies, an extension of database dependencies by enforcing bindings of semantically related data values. It accurately identifies records from unreliable data sources by leveraging relative candidate keys, an extension of keys for relations by supporting similarity and matching operators across relations. In con-trast to traditional dependencies that were developed for improving the quality of schema, the revised constraints are proposed to improve the quality of data. These constraints yield practical techniques for data re-pairing and record matching in a uniform framework.

    Ground state properties of a Tonks-Girardeau Gas in a periodic potential

    Get PDF
    In this paper, we investigate the ground-state properties of a bosonic Tonks-Girardeau gas confined in a one-dimensional periodic potential. The single-particle reduced density matrix is computed numerically for systems up to N=265N=265 bosons. Scaling analysis of the occupation number of the lowest orbital shows that there are no Bose-Einstein Condensation(BEC) for the periodically trapped TG gas in both commensurate and incommensurate cases. We find that, in the commensurate case, the scaling exponents of the occupation number of the lowest orbital, the amplitude of the lowest orbital and the zero-momentum peak height with the particle numbers are 0, -0.5 and 1, respectively, while in the incommensurate case, they are 0.5, -0.5 and 1.5, respectively. These exponents are related to each other in a universal relation.Comment: 9 pages, 10 figure

    Self-labeling techniques for semi-supervised time series classification: an empirical study

    Get PDF
    An increasing amount of unlabeled time series data available render the semi-supervised paradigm a suitable approach to tackle classification problems with a reduced quantity of labeled data. Self-labeled techniques stand out from semi-supervised classification methods due to their simplicity and the lack of strong assumptions about the distribution of the labeled and unlabeled data. This paper addresses the relevance of these techniques in the time series classification context by means of an empirical study that compares successful self-labeled methods in conjunction with various learning schemes and dissimilarity measures. Our experiments involve 35 time series datasets with different ratios of labeled data, aiming to measure the transductive and inductive classification capabilities of the self-labeled methods studied. The results show that the nearest-neighbor rule is a robust choice for the base classifier. In addition, the amending and multi-classifier self-labeled-based approaches reveal a promising attempt to perform semi-supervised classification in the time series context

    A Probabilistic Kleene Theorem

    Get PDF
    International audienceWe provide a Kleene Theorem for (Rabin) probabilistic automata over finite words. Probabilistic automata generalize deterministic finite automata and assign to a word an acceptance probability. We provide probabilistic expressions with probabilistic choice, guarded choice, concatenation, and a star operator. We prove that probabilistic expressions and probabilistic automata are expressively equivalent. Our result actually extends to two-way probabilistic automata with pebbles and corresponding expressions

    Integrated Mediterranean programmes. Commission communication, 20 February 1983. Reproduced from the Bulletin of the European Communities, No. 2/1985

    Get PDF
    We present the Lixto project, which is both a research project in database theory and a commercial enterprise that develops Web data extraction (wrapping) and Web service definition software. We discuss the project's main motivations and ideas, in particular the use of a logic-based framework for wrapping. Then we present theoretical results on monadic datalog over trees and on Elog, its close relative which is used as the internal wrapper language in the Lixto system. These results include both a characterization of the expressive power and the complexity of these languages. We describe the visual wrapper specification process in Lixto and various practical aspects of wrapping. We discuss work on the complexity of query languages for trees that was inseminated by our theoretical study of logic-based languages for wrapping. Then we return to the practice of wrapping and the Lixto Transformation Server, which allows for streaming integration of data extracted from Web pages. This is a natural requirement in complex services based on Web wrapping. Finally, we discuss industrial applications of Lixto and point to open problems for future study

    Group SAX: Extending the Notion of Contrast Sets to Time Series and Multimedia Data

    No full text

    Computing Repairs for Inconsistent XML Document Using Chase

    No full text
    corecore