69,382 research outputs found

    Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms

    Get PDF
    The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, which justifies the usage of worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search.Comment: Accepted to Proceedings of the IEEE International Conference on Data Mining (ICDM'18

    A framework for the definition of metrics for actor-dependency models

    Get PDF
    Actor-dependency models are a formalism aimed at providing intentional descriptions of processes as a network of dependency relationships among actors. This kind of models is currently widely used in the early phase of requirements engineering as well as in other contexts such as organizational analysis and business process reengineering. In this paper, we are interested in the definition of a framework for the formulation of metrics over these models. These metrics are used to analyse the models with respect to some properties that are interesting for the system being modelled, such as security, efficiency or accuracy. The metrics are defined in terms of the actors and dependencies of the model. We distinguish three different kinds of metrics that are formally defined, and then we apply the framework at two different layers of a meeting scheduler system.Postprint (published version

    Efficient Discovery of Ontology Functional Dependencies

    Full text link
    Poor data quality has become a pervasive issue due to the increasing complexity and size of modern datasets. Constraint based data cleaning techniques rely on integrity constraints as a benchmark to identify and correct errors. Data values that do not satisfy the given set of constraints are flagged as dirty, and data updates are made to re-align the data and the constraints. However, many errors often require user input to resolve due to domain expertise defining specific terminology and relationships. For example, in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be captured in a pharmaceutical ontology. While functional dependencies (FDs) have traditionally been used in existing data cleaning solutions to model syntactic equivalence, they are not able to model broader relationships (e.g., is-a) defined by an ontology. In this paper, we take a first step towards extending the set of data quality constraints used in data cleaning by defining and discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out theoretical and practical foundations for OFDs, including a set of sound and complete axioms, and a linear inference procedure. We then develop effective algorithms for discovering OFDs, and a set of optimizations that efficiently prune the search space. Our experimental evaluation using real data show the scalability and accuracy of our algorithms.Comment: 12 page

    An extensible manufacturing resource model for process integration

    Get PDF
    Driven by industrial needs and enabled by process technology and information technology, enterprise integration is rapidly shifting from information integration to process integration to improve overall performance of enterprises. Traditional resource models are established based on the needs of individual applications. They cannot effectively serve process integration which needs resources to be represented in a unified, comprehensive and flexible way to meet the needs of various applications for different business processes. This paper looks into this issue and presents a configurable and extensible resource model which can be rapidly reconfigured and extended to serve for different applications. To achieve generality, the presented resource model is established from macro level and micro level. A semantic representation method is developed to improve the flexibility and extensibility of the model

    Knowledge data discovery and data mining in a design environment

    Get PDF
    Designers, in the process of satisfying design requirements, generally encounter difficulties in, firstly, understanding the problem and secondly, finding a solution [Cross 1998]. Often the process of understanding the problem and developing a feasible solution are developed simultaneously by proposing a solution to gauge the extent to which the solution satisfies the specific requirements. Support for future design activities has long been recognised to exist in the form of past design cases, however the varying degrees of similarity and dissimilarity found between previous and current design requirements and solutions has restrained the effectiveness of utilising past design solutions. The knowledge embedded within past designs provides a source of experience with the potential to be utilised in future developments provided that the ability to structure and manipulate that knowledgecan be made a reality. The importance of providing the ability to manipulate past design knowledge, allows the ranging viewpoints experienced by a designer, during a design process, to be reflected and supported. Data Mining systems are gaining acceptance in several domains but to date remain largely unrecognised in terms of the potential to support design activities. It is the focus of this paper to introduce the functionality possessed within the realm of Data Mining tools, and to evaluate the level of support that may be achieved in manipulating and utilising experiential knowledge to satisfy designers' ranging perspectives throughout a product's development

    Long-lived non-classical correlations for scalable quantum repeaters at room temperature

    Get PDF
    Heralded single-photon sources with on-demand readout are promising candidates for quantum repeaters enabling long-distance quantum communication. The need for scalability of such systems requires simple experimental solutions, thus favouring room-temperature systems. For quantum repeater applications, long delays between heralding and single-photon readout are crucial. Until now, this has been prevented in room-temperature atomic systems by fast decoherence due to thermal motion. Here we demonstrate efficient heralding and readout of single collective excitations created in warm caesium vapour. Using the principle of motional averaging we achieve a collective excitation lifetime of 0.27±0.040.27\pm 0.04 ms, two orders of magnitude larger than previously achieved for single excitations in room-temperature sources. We experimentally verify non-classicality of the light-matter correlations by observing a violation of the Cauchy-Schwarz inequality with R=1.4±0.1>1R=1.4\pm 0.1>1. Through spectral and temporal analysis we identify intrinsic four-wave mixing noise as the main contribution compromising single-photon operation of the source.Comment: 21 pages total, the first 17 pages are the main article and the remaining pages are supplemental materia
    • …
    corecore