5,928 research outputs found
Incremental View Maintenance For Collection Programming
In the context of incremental view maintenance (IVM), delta query derivation
is an essential technique for speeding up the processing of large, dynamic
datasets. The goal is to generate delta queries that, given a small change in
the input, can update the materialized view more efficiently than via
recomputation. In this work we propose the first solution for the efficient
incrementalization of positive nested relational calculus (NRC+) on bags (with
integer multiplicities). More precisely, we model the cost of NRC+ operators
and classify queries as efficiently incrementalizable if their delta has a
strictly lower cost than full re-evaluation. Then, we identify IncNRC+; a large
fragment of NRC+ that is efficiently incrementalizable and we provide a
semantics-preserving translation that takes any NRC+ query to a collection of
IncNRC+ queries. Furthermore, we prove that incremental maintenance for NRC+ is
within the complexity class NC0 and we showcase how recursive IVM, a technique
that has provided significant speedups over traditional IVM in the case of flat
queries [25], can also be applied to IncNRC+.Comment: 24 pages (12 pages plus appendix
A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations
Resilience is one of the key algorithmic problems underlying various forms of
reverse data management (such as view maintenance, deletion propagation, and
various interventions for fairness): What is the minimal number of tuples to
delete from a database in order to remove all answers from a query? A long-open
question is determining those conjunctive queries (CQs) for which this problem
can be solved in guaranteed PTIME. We shed new light on this and the related
problem of causal responsibility by proposing a unified Integer Linear
Programming (ILP) formulation. It is unified in that it can solve both prior
studied restrictions (e.g., self-join-free CQs under set semantics that allow a
PTIME solution) and new cases (e.g., all CQs under set or bag semantics It is
also unified in that all queries and all instances are treated with the same
approach, and the algorithm is guaranteed to terminate in PTIME for the easy
cases. We prove that, for all easy self-join-free CQs, the Linear Programming
(LP) relaxation of our encoding is identical to the ILP solution and thus
standard ILP solvers are guaranteed to return the solution in PTIME. Our
approach opens up the door to new variants and new fine-grained analysis: 1) It
also works under bag semantics and we give the first dichotomy result for bags
semantics in the problem space. 2) We give a more fine-grained analysis of the
complexity of causal responsibility. 3) We recover easy instances for generally
hard queries, such as instances with read-once provenance and instances that
become easy because of Functional Dependencies in the data. 4) We solve an open
conjecture from PODS 2020. 5) Experiments confirm that our results indeed
predict the asymptotic running times, and that our universal ILP encoding is at
times even faster to solve for the PTIME cases than a prior proposed dedicated
flow algorithm.Comment: 25 pages, 16 figure
Image classification by visual bag-of-words refinement and reduction
This paper presents a new framework for visual bag-of-words (BOW) refinement
and reduction to overcome the drawbacks associated with the visual BOW model
which has been widely used for image classification. Although very influential
in the literature, the traditional visual BOW model has two distinct drawbacks.
Firstly, for efficiency purposes, the visual vocabulary is commonly constructed
by directly clustering the low-level visual feature vectors extracted from
local keypoints, without considering the high-level semantics of images. That
is, the visual BOW model still suffers from the semantic gap, and thus may lead
to significant performance degradation in more challenging tasks (e.g. social
image classification). Secondly, typically thousands of visual words are
generated to obtain better performance on a relatively large image dataset. Due
to such large vocabulary size, the subsequent image classification may take
sheer amount of time. To overcome the first drawback, we develop a graph-based
method for visual BOW refinement by exploiting the tags (easy to access
although noisy) of social images. More notably, for efficient image
classification, we further reduce the refined visual BOW model to a much
smaller size through semantic spectral clustering. Extensive experimental
results show the promising performance of the proposed framework for visual BOW
refinement and reduction
Ranking-based Deep Cross-modal Hashing
Cross-modal hashing has been receiving increasing interests for its low
storage cost and fast query speed in multi-modal data retrievals. However, most
existing hashing methods are based on hand-crafted or raw level features of
objects, which may not be optimally compatible with the coding process.
Besides, these hashing methods are mainly designed to handle simple pairwise
similarity. The complex multilevel ranking semantic structure of instances
associated with multiple labels has not been well explored yet. In this paper,
we propose a ranking-based deep cross-modal hashing approach (RDCMH). RDCMH
firstly uses the feature and label information of data to derive a
semi-supervised semantic ranking list. Next, to expand the semantic
representation power of hand-crafted features, RDCMH integrates the semantic
ranking information into deep cross-modal hashing and jointly optimizes the
compatible parameters of deep feature representations and of hashing functions.
Experiments on real multi-modal datasets show that RDCMH outperforms other
competitive baselines and achieves the state-of-the-art performance in
cross-modal retrieval applications
Knowledge Discovery and Management within Service Centers
These days, most enterprise service centers deploy Knowledge Discovery and Management (KDM) systems to address the challenge of timely delivery of a resourceful service request resolution while efficiently utilizing the huge amount of data. These KDM systems facilitate prompt response to the critical service requests and if possible then try to prevent the service requests getting triggered in the first place. Nevertheless, in most cases, information required for a request resolution is dispersed and suppressed under the mountain of irrelevant information over the Internet in unstructured and heterogeneous formats. These heterogeneous data sources and formats complicate the access to reusable knowledge and increase the response time required to reach a resolution. Moreover, the state-of-the art methods neither support effective integration of domain knowledge with the KDM systems nor promote the assimilation of reusable knowledge or Intellectual Capital (IC). With the goal of providing an improved service request resolution within the shortest possible time, this research proposes an IC Management System. The proposed tool efficiently utilizes domain knowledge in the form of semantic web technology to extract the most valuable information from those raw unstructured data and uses that knowledge to formulate service resolution model as a combination of efficient data search, classification, clustering, and recommendation methods. Our proposed solution also handles the technology categorization of a service request which is very crucial in the request resolution process. The system has been extensively evaluated with several experiments and has been used in a real enterprise customer service center
Βελτιστοποίηση ερωτημάτων χρησιμοποιώντας σημασιολογία πολυσυνόλου και συνόλου-πολυσυνόλου σε περιβάλλον ετερογενών πηγών πληροφόρησης
184 σ.Στην συγκεκριμένη διατριβή, μελετάμε ανάπτυξη τεχνικών βελτιστοποίησης ερωτημάτων με την χρήση όψεων, σε σχεσιακές και XML βάσεις δεδομένων. Ειδικότερα, επικεντρωνόμαστε στα ακόλουθα βασικά προβλήματα βελτιστοποίησης ερωτημάτων: την περιεκτικότητα ερωτημάτων, την αναδιατύπωση ερωτημάτων και την επιλογή όψεων. Στις σχεσιακές βάσεις δεδομένων, επικεντρωνόμαστε στα συζευκτικά ερωτήματα (εν συντομία CQs), που αντιστοιχούν σε SQL ερωτήματα με χρήση των τελεστών select, project και join. Επίσης, χρησιμοποιούμε σημασιολογίες πολυσυνόλου (οι βασικές σχέσεις και οι απαντήσεις των ερωτημάτων είναι πολυσύνολα) και συνόλου-πολυσυνόλου (οι βασικές σχέσεις είναι σύνολα, ενώ οι απαντήσεις είναι πολυσύνολα) για να περιγράψουμε, θεωρητικά, την σημασιολογία της SQL. Για ερωτήματα σε XML δεδομένα χρησιμοποιούμε την γλώσσα XPath, και ειδικότερα επικεντρωνόμαστε στις τρεις βασικές υποκλάσεις της γλώσσας, που σχηματίζεται από την χρήση δύο από τα τρία βασικά συστατικά: wildcard ετικέτες (*), ακμές απογόνου (//) και κλαδιά ([ ]). Στο πλαίσιο της περιεκτικότητας ερωτημάτων μελετάμε το πρόβλημα, καθώς και την πολυπλοκότητα του, για βασικές υποκλάσεις των CQs. Για την γενική κλάση των CQs το πρόβλημα παραμένει ανοικτό εδώ και μια δεκαετία. Επιπλέον, μελετάμε τα προβλήματα περιεκτικότητας και ισοδυναμίας για ενώσεις XPath ερωτημάτων. Για την αναδιατύπωση CQ ερωτημάτων, περιγράφουμε βασικές συνθήκες που πρέπει να πληρούν οι όψεις έτσι ώστε να υπάρχει μία ισοδύναμη αναδιατύπωση. Για τα XPath ερωτήματα που σχηματίζονται από // και *, δείχνουμε ότι η χρήση του τελεστή ένωσης απαιτείται για την εύρεση ισοδύναμης αναδιατύπωσης. Το πρόβλημα επιλογής όψεων μελετάται για CQ ερωτήματα, όπου επικεντρωνόμαστε στον περιορισμό του χώρου αναζήτησης βέλτιστων λύσεων. Ειδικότερα, δείχνομαι ότι εάν η επιλογή του συνόλου όψεων γίνεται βάσει συγκεκριμένων συνθηκών (ως προς την μορφή των όψεων), τότε εξασφαλίζεται η εύρεση τουλάχιστον μίας βέλτιστης λύσης για το πρόβλημα. Έπειτα, επικεντρωνόμενοι σε υποκλάσεις των CQ ερωτημάτων, δείχνουμε ότι για ένα σύνολο ερωτημάτων αλυσίδας, και για τις δύο σημασιολογίες, όψεις που ορίζονται, και αυτές, από ερωτήματα αλυσίδας δεν επαρκούν, πάντα, για την εύρεση βέλτιστης λύσης. Στην περίπτωση, όμως, των ερωτημάτων μονοπατιού, και θεωρώντας σημασιολογία πολυσυνόλου, δείχνουμε ότι οι όψεις που ορίζονται από ερωτήματα μονοπατιού μας εξασφαλίζουν την εύρεση τουλάχιστον μίας βέλτιστης λύσης για το πρόβλημα επιλογής όψεων.In this thesis, we investigate techniques for query optimization using a set of views, considering both relational and XML databases. In particular, we focus on three fundamental problems of query optimization; which are the query containment, the query rewriting and the view selection. For relational databases we focus on the class of select-project-join SQL queries with equality comparisons, a.k.a. conjunctive queries (CQs for short). We consider two kinds of semantics to theoretically approximate the SQL semantics: the bag (multiple occurrences of the same tuple are allowed in both base relations and answers of queries) and bag-set semantics (the base relations are sets and the operators are liable for bag-results). For XML databases, we focus on XPath. Especially, we focus on the major fragments of XPath which contain two of the constructs: wildcard, descendant edge and branches. Query containment under both bag and bag-set semantics is investigated through a detailed analysis of special cases of CQs. The complexity in each case is given, as well. For the general case, the problem remains open for more than a decade. Moreover, we give necessary and sufficient conditions for deciding both containment and equivalence for unions of XPath queries; a problem which was not investigated in depth, in the past. The problem of finding an equivalent rewriting is also investigated for both relational and XPath queries. In particular, for relational queries, we describe the requirements that a set of views have to satisfy in order to give an equivalent rewriting of a CQ under both bag and bag-set semantics. In the case of XML databases, we investigate the problem of rewriting an XPath query using multiple views, and prove that in the case that the query contains both descendant edges and wildcards, the union operator may be required for finding an equivalent rewriting. The view selection is investigated for workloads of CQs under both bag and bag-set semantics. Especially, we aim to limit the search space of candidate viewsets. We start with the general case, where we give a tight condition that candidate views can satisfy and still the search space does contain at least one optimal solution. Then we study special cases. We show that for chain query workloads under both bag and bag-set semantics, taking only chain views may miss optimal solution, whereas, if we further limit the queries to be path queries, then under bag semantics, path views suffice.Ματθαίος Γ. Δαμίγο
- …