2,247 research outputs found

    Missing values estimation for skylines in incomplete database

    Get PDF
    Incompleteness of data is a common problem in many databases including web heterogeneous databases, multi-relational databases, spatial and temporal databases and data integration. The incompleteness of data introduces challenges in processing queries as providing accurate results that best meet the query conditions over incomplete database is not a trivial task. Several techniques have been proposed to process queries in incomplete database. Some of these techniques retrieve the query results based on the existing values rather than estimating the missing values. Such techniques are undesirable in many cases as the dimensions with missing values might be the important dimensions of the user’s query. Besides, the output is incomplete and might not satisfy the user preferences. In this paper we propose an approach that estimates missing values in skylines to guide users in selecting the most appropriate skylines from the several candidate skylines. The approach utilizes the concept of mining attribute correlations to generate an Approximate Functional Dependencies (AFDs) that captured the relationships between the dimensions. Besides, identifying the strength of probability correlations to estimate the values. Then, the skylines with estimated values are ranked. By doing so, we ensure that the retrieved skylines are in the order of their estimated precision

    The data-exchange chase under the microscope

    Full text link
    In this paper we take closer look at recent developments for the chase procedure, and provide additional results. Our analysis allows us create a taxonomy of the chase variations and the properties they satisfy. Two of the most central problems regarding the chase is termination, and discovery of restricted classes of sets of dependencies that guarantee termination of the chase. The search for the restricted classes has been motivated by a fairly recent result that shows that it is undecidable to determine whether the chase with a given dependency set will terminate on a given instance. There is a small dissonance here, since the quest has been for classes of sets of dependencies guaranteeing termination of the chase on all instances, even though the latter problem was not known to be undecidable. We resolve the dissonance in this paper by showing that determining whether the chase with a given set of dependencies terminates on all instances is coRE-complete. For the hardness proof we use a reduction from word rewriting systems, thereby also showing the close connection between the chase and word rewriting. The same reduction also gives us the aforementioned instance-dependent RE-completeness result as a byproduct. For one of the restricted classes guaranteeing termination on all instances, the stratified sets dependencies, we provide new complexity results for the problem of testing whether a given set of dependencies belongs to it. These results rectify some previous claims that have occurred in the literature.Comment: arXiv admin note: substantial text overlap with arXiv:1303.668

    A model and framework for reliable build systems

    Full text link
    Reliable and fast builds are essential for rapid turnaround during development and testing. Popular existing build systems rely on correct manual specification of build dependencies, which can lead to invalid build outputs and nondeterminism. We outline the challenges of developing reliable build systems and explore the design space for their implementation, with a focus on non-distributed, incremental, parallel build systems. We define a general model for resources accessed by build tasks and show its correspondence to the implementation technique of minimum information libraries, APIs that return no information that the application doesn't plan to use. We also summarize preliminary experimental results from several prototype build managers

    Social Collaborative Retrieval

    Full text link
    Socially-based recommendation systems have recently attracted significant interest, and a number of studies have shown that social information can dramatically improve a system's predictions of user interests. Meanwhile, there are now many potential applications that involve aspects of both recommendation and information retrieval, and the task of collaborative retrieval---a combination of these two traditional problems---has recently been introduced. Successful collaborative retrieval requires overcoming severe data sparsity, making additional sources of information, such as social graphs, particularly valuable. In this paper we propose a new model for collaborative retrieval, and show that our algorithm outperforms current state-of-the-art approaches by incorporating information from social networks. We also provide empirical analyses of the ways in which cultural interests propagate along a social graph using a real-world music dataset.Comment: 10 page

    Proceedings of the 2008 Oxford University Computing Laboratory student conference.

    Get PDF
    This conference serves two purposes. First, the event is a useful pedagogical exercise for all participants, from the conference committee and referees, to the presenters and the audience. For some presenters, the conference may be the first time their work has been subjected to peer-review. For others, the conference is a testing ground for announcing work, which will be later presented at international conferences, workshops, and symposia. This leads to the conference's second purpose: an opportunity to expose the latest-and-greatest research findings within the laboratory. The fourteen abstracts within these proceedings were selected by the programme and conference committee after a round of peer-reviewing, by both students and staff within this department

    Explanation-Based Auditing

    Full text link
    To comply with emerging privacy laws and regulations, it has become common for applications like electronic health records systems (EHRs) to collect access logs, which record each time a user (e.g., a hospital employee) accesses a piece of sensitive data (e.g., a patient record). Using the access log, it is easy to answer simple queries (e.g., Who accessed Alice's medical record?), but this often does not provide enough information. In addition to learning who accessed their medical records, patients will likely want to understand why each access occurred. In this paper, we introduce the problem of generating explanations for individual records in an access log. The problem is motivated by user-centric auditing applications, and it also provides a novel approach to misuse detection. We develop a framework for modeling explanations which is based on a fundamental observation: For certain classes of databases, including EHRs, the reason for most data accesses can be inferred from data stored elsewhere in the database. For example, if Alice has an appointment with Dr. Dave, this information is stored in the database, and it explains why Dr. Dave looked at Alice's record. Large numbers of data accesses can be explained using general forms called explanation templates. Rather than requiring an administrator to manually specify explanation templates, we propose a set of algorithms for automatically discovering frequent templates from the database (i.e., those that explain a large number of accesses). We also propose techniques for inferring collaborative user groups, which can be used to enhance the quality of the discovered explanations. Finally, we have evaluated our proposed techniques using an access log and data from the University of Michigan Health System. Our results demonstrate that in practice we can provide explanations for over 94% of data accesses in the log.Comment: VLDB201
    corecore