5,928 research outputs found

    Incremental View Maintenance For Collection Programming

    Get PDF
    In the context of incremental view maintenance (IVM), delta query derivation is an essential technique for speeding up the processing of large, dynamic datasets. The goal is to generate delta queries that, given a small change in the input, can update the materialized view more efficiently than via recomputation. In this work we propose the first solution for the efficient incrementalization of positive nested relational calculus (NRC+) on bags (with integer multiplicities). More precisely, we model the cost of NRC+ operators and classify queries as efficiently incrementalizable if their delta has a strictly lower cost than full re-evaluation. Then, we identify IncNRC+; a large fragment of NRC+ that is efficiently incrementalizable and we provide a semantics-preserving translation that takes any NRC+ query to a collection of IncNRC+ queries. Furthermore, we prove that incremental maintenance for NRC+ is within the complexity class NC0 and we showcase how recursive IVM, a technique that has provided significant speedups over traditional IVM in the case of flat queries [25], can also be applied to IncNRC+.Comment: 24 pages (12 pages plus appendix

    A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations

    Full text link
    Resilience is one of the key algorithmic problems underlying various forms of reverse data management (such as view maintenance, deletion propagation, and various interventions for fairness): What is the minimal number of tuples to delete from a database in order to remove all answers from a query? A long-open question is determining those conjunctive queries (CQs) for which this problem can be solved in guaranteed PTIME. We shed new light on this and the related problem of causal responsibility by proposing a unified Integer Linear Programming (ILP) formulation. It is unified in that it can solve both prior studied restrictions (e.g., self-join-free CQs under set semantics that allow a PTIME solution) and new cases (e.g., all CQs under set or bag semantics It is also unified in that all queries and all instances are treated with the same approach, and the algorithm is guaranteed to terminate in PTIME for the easy cases. We prove that, for all easy self-join-free CQs, the Linear Programming (LP) relaxation of our encoding is identical to the ILP solution and thus standard ILP solvers are guaranteed to return the solution in PTIME. Our approach opens up the door to new variants and new fine-grained analysis: 1) It also works under bag semantics and we give the first dichotomy result for bags semantics in the problem space. 2) We give a more fine-grained analysis of the complexity of causal responsibility. 3) We recover easy instances for generally hard queries, such as instances with read-once provenance and instances that become easy because of Functional Dependencies in the data. 4) We solve an open conjecture from PODS 2020. 5) Experiments confirm that our results indeed predict the asymptotic running times, and that our universal ILP encoding is at times even faster to solve for the PTIME cases than a prior proposed dedicated flow algorithm.Comment: 25 pages, 16 figure

    Image classification by visual bag-of-words refinement and reduction

    Full text link
    This paper presents a new framework for visual bag-of-words (BOW) refinement and reduction to overcome the drawbacks associated with the visual BOW model which has been widely used for image classification. Although very influential in the literature, the traditional visual BOW model has two distinct drawbacks. Firstly, for efficiency purposes, the visual vocabulary is commonly constructed by directly clustering the low-level visual feature vectors extracted from local keypoints, without considering the high-level semantics of images. That is, the visual BOW model still suffers from the semantic gap, and thus may lead to significant performance degradation in more challenging tasks (e.g. social image classification). Secondly, typically thousands of visual words are generated to obtain better performance on a relatively large image dataset. Due to such large vocabulary size, the subsequent image classification may take sheer amount of time. To overcome the first drawback, we develop a graph-based method for visual BOW refinement by exploiting the tags (easy to access although noisy) of social images. More notably, for efficient image classification, we further reduce the refined visual BOW model to a much smaller size through semantic spectral clustering. Extensive experimental results show the promising performance of the proposed framework for visual BOW refinement and reduction

    Ranking-based Deep Cross-modal Hashing

    Full text link
    Cross-modal hashing has been receiving increasing interests for its low storage cost and fast query speed in multi-modal data retrievals. However, most existing hashing methods are based on hand-crafted or raw level features of objects, which may not be optimally compatible with the coding process. Besides, these hashing methods are mainly designed to handle simple pairwise similarity. The complex multilevel ranking semantic structure of instances associated with multiple labels has not been well explored yet. In this paper, we propose a ranking-based deep cross-modal hashing approach (RDCMH). RDCMH firstly uses the feature and label information of data to derive a semi-supervised semantic ranking list. Next, to expand the semantic representation power of hand-crafted features, RDCMH integrates the semantic ranking information into deep cross-modal hashing and jointly optimizes the compatible parameters of deep feature representations and of hashing functions. Experiments on real multi-modal datasets show that RDCMH outperforms other competitive baselines and achieves the state-of-the-art performance in cross-modal retrieval applications

    Knowledge Discovery and Management within Service Centers

    Get PDF
    These days, most enterprise service centers deploy Knowledge Discovery and Management (KDM) systems to address the challenge of timely delivery of a resourceful service request resolution while efficiently utilizing the huge amount of data. These KDM systems facilitate prompt response to the critical service requests and if possible then try to prevent the service requests getting triggered in the first place. Nevertheless, in most cases, information required for a request resolution is dispersed and suppressed under the mountain of irrelevant information over the Internet in unstructured and heterogeneous formats. These heterogeneous data sources and formats complicate the access to reusable knowledge and increase the response time required to reach a resolution. Moreover, the state-of-the art methods neither support effective integration of domain knowledge with the KDM systems nor promote the assimilation of reusable knowledge or Intellectual Capital (IC). With the goal of providing an improved service request resolution within the shortest possible time, this research proposes an IC Management System. The proposed tool efficiently utilizes domain knowledge in the form of semantic web technology to extract the most valuable information from those raw unstructured data and uses that knowledge to formulate service resolution model as a combination of efficient data search, classification, clustering, and recommendation methods. Our proposed solution also handles the technology categorization of a service request which is very crucial in the request resolution process. The system has been extensively evaluated with several experiments and has been used in a real enterprise customer service center

    Βελτιστοποίηση ερωτημάτων χρησιμοποιώντας σημασιολογία πολυσυνόλου και συνόλου-πολυσυνόλου σε περιβάλλον ετερογενών πηγών πληροφόρησης

    Get PDF
    184 σ.Στην συγκεκριμένη διατριβή, μελετάμε ανάπτυξη τεχνικών βελτιστοποίησης ερωτημάτων με την χρήση όψεων, σε σχεσιακές και XML βάσεις δεδομένων. Ειδικότερα, επικεντρωνόμαστε στα ακόλουθα βασικά προβλήματα βελτιστοποίησης ερωτημάτων: την περιεκτικότητα ερωτημάτων, την αναδιατύπωση ερωτημάτων και την επιλογή όψεων. Στις σχεσιακές βάσεις δεδομένων, επικεντρωνόμαστε στα συζευκτικά ερωτήματα (εν συντομία CQs), που αντιστοιχούν σε SQL ερωτήματα με χρήση των τελεστών select, project και join. Επίσης, χρησιμοποιούμε σημασιολογίες πολυσυνόλου (οι βασικές σχέσεις και οι απαντήσεις των ερωτημάτων είναι πολυσύνολα) και συνόλου-πολυσυνόλου (οι βασικές σχέσεις είναι σύνολα, ενώ οι απαντήσεις είναι πολυσύνολα) για να περιγράψουμε, θεωρητικά, την σημασιολογία της SQL. Για ερωτήματα σε XML δεδομένα χρησιμοποιούμε την γλώσσα XPath, και ειδικότερα επικεντρωνόμαστε στις τρεις βασικές υποκλάσεις της γλώσσας, που σχηματίζεται από την χρήση δύο από τα τρία βασικά συστατικά: wildcard ετικέτες (*), ακμές απογόνου (//) και κλαδιά ([ ]). Στο πλαίσιο της περιεκτικότητας ερωτημάτων μελετάμε το πρόβλημα, καθώς και την πολυπλοκότητα του, για βασικές υποκλάσεις των CQs. Για την γενική κλάση των CQs το πρόβλημα παραμένει ανοικτό εδώ και μια δεκαετία. Επιπλέον, μελετάμε τα προβλήματα περιεκτικότητας και ισοδυναμίας για ενώσεις XPath ερωτημάτων. Για την αναδιατύπωση CQ ερωτημάτων, περιγράφουμε βασικές συνθήκες που πρέπει να πληρούν οι όψεις έτσι ώστε να υπάρχει μία ισοδύναμη αναδιατύπωση. Για τα XPath ερωτήματα που σχηματίζονται από // και *, δείχνουμε ότι η χρήση του τελεστή ένωσης απαιτείται για την εύρεση ισοδύναμης αναδιατύπωσης. Το πρόβλημα επιλογής όψεων μελετάται για CQ ερωτήματα, όπου επικεντρωνόμαστε στον περιορισμό του χώρου αναζήτησης βέλτιστων λύσεων. Ειδικότερα, δείχνομαι ότι εάν η επιλογή του συνόλου όψεων γίνεται βάσει συγκεκριμένων συνθηκών (ως προς την μορφή των όψεων), τότε εξασφαλίζεται η εύρεση τουλάχιστον μίας βέλτιστης λύσης για το πρόβλημα. Έπειτα, επικεντρωνόμενοι σε υποκλάσεις των CQ ερωτημάτων, δείχνουμε ότι για ένα σύνολο ερωτημάτων αλυσίδας, και για τις δύο σημασιολογίες, όψεις που ορίζονται, και αυτές, από ερωτήματα αλυσίδας δεν επαρκούν, πάντα, για την εύρεση βέλτιστης λύσης. Στην περίπτωση, όμως, των ερωτημάτων μονοπατιού, και θεωρώντας σημασιολογία πολυσυνόλου, δείχνουμε ότι οι όψεις που ορίζονται από ερωτήματα μονοπατιού μας εξασφαλίζουν την εύρεση τουλάχιστον μίας βέλτιστης λύσης για το πρόβλημα επιλογής όψεων.In this thesis, we investigate techniques for query optimization using a set of views, considering both relational and XML databases. In particular, we focus on three fundamental problems of query optimization; which are the query containment, the query rewriting and the view selection. For relational databases we focus on the class of select-project-join SQL queries with equality comparisons, a.k.a. conjunctive queries (CQs for short). We consider two kinds of semantics to theoretically approximate the SQL semantics: the bag (multiple occurrences of the same tuple are allowed in both base relations and answers of queries) and bag-set semantics (the base relations are sets and the operators are liable for bag-results). For XML databases, we focus on XPath. Especially, we focus on the major fragments of XPath which contain two of the constructs: wildcard, descendant edge and branches. Query containment under both bag and bag-set semantics is investigated through a detailed analysis of special cases of CQs. The complexity in each case is given, as well. For the general case, the problem remains open for more than a decade. Moreover, we give necessary and sufficient conditions for deciding both containment and equivalence for unions of XPath queries; a problem which was not investigated in depth, in the past. The problem of finding an equivalent rewriting is also investigated for both relational and XPath queries. In particular, for relational queries, we describe the requirements that a set of views have to satisfy in order to give an equivalent rewriting of a CQ under both bag and bag-set semantics. In the case of XML databases, we investigate the problem of rewriting an XPath query using multiple views, and prove that in the case that the query contains both descendant edges and wildcards, the union operator may be required for finding an equivalent rewriting. The view selection is investigated for workloads of CQs under both bag and bag-set semantics. Especially, we aim to limit the search space of candidate viewsets. We start with the general case, where we give a tight condition that candidate views can satisfy and still the search space does contain at least one optimal solution. Then we study special cases. We show that for chain query workloads under both bag and bag-set semantics, taking only chain views may miss optimal solution, whereas, if we further limit the queries to be path queries, then under bag semantics, path views suffice.Ματθαίος Γ. Δαμίγο
    corecore