10 research outputs found

    One-dimensional and multi-dimensional substring selectivity estimation

    Full text link
    With the increasing importance of XML, LDAP directories, and text-based information sources on the Internet, there is an ever-greater need to evaluate queries involving (sub)string matching. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the multiple dimensions. Effective query optimization in this context requires good selectivity estimates. In this paper, we use pruned count-suffix trees (PSTs) as the basic data structure for substring selectivity estimation. For the 1-D problem, we present a novel technique called MO (Maximal Overlap). We then develop and analyze two 1-D estimation algorithms, MOC and MOLC, based on MO and a constraint-based characterization of all possible completions of a given PST. For the k -D problem, we first generalize PSTs to multiple dimensions and develop a space- and time-efficient probabilistic algorithm to construct k -D PSTs directly. We then show how to extend MO to multiple dimensions. Finally, we demonstrate, both analytically and experimentally, that MO is both practical and substantially superior to competing algorithms.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/42330/1/778-9-3-214_00090214.pd

    Beyond Green Regulations: Achieving True Sustainability through Engagement in a Forced Adoption Context

    No full text
    The article discusses a study which looked at the key elements related to sustainability, behavior and forced adoption of technology. These elements include the consumers' engagement ranging from desired behavior change on the positive side to active disengagement due to psychological reactance on the negative side, and the role of marketing in encouraging engagement when adoption is forced. The study enrolled a total of 50 participants between September and November 201

    Dynamic Memory Allocation for Large Query Execution

    No full text
    ABSTRACT. The execution time of a large query depends mainly on the memory utilization which should avoid disk accesses for intermediate results. Poor memory management can hurt performance and even lead to system thrashing because of paging. However, memory management optimization is hard to incorporate in a query optimizer because of cost estimate errors. In this paper, we address the problem of efficient memory management for large query execution. We propose a static memory allocation scheme applied at start-up time, and a more efficient dynamic execution model which performs memory-adaptive scheduling of the query. Our execution model handles graciously memory overflow by choosing dynamically the best scheduling among several possible using a simple cost model. The model is robust to cost estimate errors. We describe a performance evaluation using a prototype implementation. The experiments with many queries show significant gain over static strategies. KEY WORDS: Memory Management, Query Execution, System Paging RÉSUMÉ Le temps d’exécution de requêtes complexes dépends principalement de la gestion de la mémoire qui peut éviter des accès disques, notamment pour les résultats intermédiaires. Une mauvaise gestion de la mémoire peut entraîner un écroulement des performances dû au swap système. Cependant, il est difficile d’incorporer l’optimisation de l’utilisation de la mémoire dans l’optimiseur de requêtes. Dans cet article, nous proposons des solutions pour une gestion efficace de la mémoire lors de l’exécution de requêtes complexes. Plusieurs méthodes statiques d’allocation de la mémoire sont d’abord proposées, puis un mécanisme dynamique, plus efficace est présenté. Ce dernier change l’ordre d’exécution des requêtes de manière à optimiser la gestion de la mémoire. Il permet ainsi de s’adapter à d’éventuels débordement de capacité ou à des erreurs du modèle de coût. Nous réalisons une évaluation de performances sur un prototype montrant des gains importants par rapport aux stratégies statiques. MOTS CLÉS: gestion de la mémoire, exécution de requête, système, swap. 1

    Dealing with Discrepancies in Wrapper Functionality

    Get PDF
    Much of the world's information is stored electronically in data sources. The data sources can be full-fledged databases, simple files, HTML pages or specialized data sources that possess diverse query processing capabilities. The common architecture to integrate such sources consists of mediators that give a global view over the content of all sources, and wrappers that give a local view of each source. Answering queries in this architecture is a difficult problem due to the wide range of capabilities of data sources. This paper presents a solution to this problem in the context of the Disco query processor. We provide a tool to the wrapper implementor to describe the capabilities of the wrapper in fine detail. When a wrapper is registered with the mediator, the mediator uploads the capabilities of the wrapper, and smoothly integrates these capabilities into query processing. Our solution is novel both in the level of detail permitted by the tool and its easy incorporation into exis..

    Evolution and Revolutions in LDAP Directory Caches

    No full text
    LDAP directories have recently proliferated with the growth of the Internet, and are being used in a wide variety of network-based applications. In this paper, we propose the use of generalized queries, referred to as query templates, obtained by generalizing individual user queries, as the semantic basis for low overhead, high benefit LDAP directory caches for handling declarative queries. We present efficient incremental algorithms that, given a sequence of user queries, maintain a set of potentially beneficial candidate query templates, and select a subset of these candidates for admission into the directory cache. A novel feature of our algorithms is their ability to deal with overlapping query templates. Finally, we demonstrate the advantages of template caches over query caches, with an experimental study based on real data and a prototype implementation of the LDAP directory cache

    Using LDAP Directory Caches

    No full text
    LDAP (Lightweight Directory Access Protocol) directories have recently proliferated with the growth of the Internet, and are being used in a wide variety of network-based applications to store data such as personal profiles, address books, and network and service policies. These systems provide a means for managing heterogeneity in a way far superior to what conventional relational or object-oriented databases can offer. To achieve fast performance for declarative query answering, it is desirable to use client caching based on semantic information (instead of individual directory entries). We formally consider the problem of reusing cached LDAP directory entries for answering declarative LDAP queries. A semantic LDAP directory cache contains directory entries, which are semantically described by a set of query templates. We show that, for conjunctive queries and LDAP directory caches with positive templates, the complexity of cache-answerability is NP-complete in the size of the query...

    Multi-Dimensional Substring Selectivity Estimation

    No full text
    With the explosion of the Internet, LDAP directories and XML, there is an ever greater need to evaluate queries involving (sub)string matching. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the dimensions. Effective query optimization in this context requires good selectivity estimates. In this paper, we use multi-dimensional countsuffix trees as the basic framework for substring selectivity estimation. Given the enormous size of these trees for large databases, we develop a space and time efficient probabilistic algorithm to construct multi-dimensional pruned count-suffix trees directly. We then present two techniques to obtain good estimates for a given multi-dimensional substring matching query, using a pruned countsuffix tree. The first one, called GNO (for Greedy Non-Overlap), generalizes the greedy parsing suggested by Krishnan et al. [9] for one-dimensional substring selectivity estimation. The second one, called MO (for Maximal ..
    corecore