451 research outputs found

    Optimising Sargable Conjunctive Predicate Queries in the Context of Big Data

    Get PDF
    With the continued increase in the volume of data, the volume dimension of big data has become a significant factor in estimating query time. When all other factors are held constant, query time increases as the volume of data increases and vice versa. To enhance query time, several techniques have come out of research efforts in this direction. One of such techniques is factorisation of query predicates. Factorisation has been used as a query optimization technique for the general class of predicates but has been found inapplicable to the subclass of sargable conjunctive equality predicates. Experiments performed exposed a peculiar nature of sargable conjunctive equality predicates based on which insight, the concatenated predicate model was formulated as capable of optimising sargable conjunctive equality predicates. Equations from research results were combined in a way that theorems describing the application and optimality of the concatenated predicate model were derived and proved

    Adaptive content mapping for internet navigation

    Get PDF
    The Internet as the biggest human library ever assembled keeps on growing. Although all kinds of information carriers (e.g. audio/video/hybrid file formats) are available, text based documents dominate. It is estimated that about 80% of all information worldwide stored electronically exists in (or can be converted into) text form. More and more, all kinds of documents are generated by means of a text processing system and are therefore available electronically. Nowadays, many printed journals are also published online and may even discontinue to appear in print form tomorrow. This development has many convincing advantages: the documents are both available faster (cf. prepress services) and cheaper, they can be searched more easily, the physical storage only needs a fraction of the space previously necessary and the medium will not age. For most people, fast and easy access is the most interesting feature of the new age; computer-aided search for specific documents or Web pages becomes the basic tool for information-oriented work. But this tool has problems. The current keyword based search machines available on the Internet are not really appropriate for such a task; either there are (way) too many documents matching the specified keywords are presented or none at all. The problem lies in the fact that it is often very difficult to choose appropriate terms describing the desired topic in the first place. This contribution discusses the current state-of-the-art techniques in content-based searching (along with common visualization/browsing approaches) and proposes a particular adaptive solution for intuitive Internet document navigation, which not only enables the user to provide full texts instead of manually selected keywords (if available), but also allows him/her to explore the whole database

    An incremental database access method for autonomous interoperable databases

    Get PDF
    We investigated a number of design and performance issues of interoperable database management systems (DBMS's). The major results of our investigation were obtained in the areas of client-server database architectures for heterogeneous DBMS's, incremental computation models, buffer management techniques, and query optimization. We finished a prototype of an advanced client-server workstation-based DBMS which allows access to multiple heterogeneous commercial DBMS's. Experiments and simulations were then run to compare its performance with the standard client-server architectures. The focus of this research was on adaptive optimization methods of heterogeneous database systems. Adaptive buffer management accounts for the random and object-oriented access methods for which no known characterization of the access patterns exists. Adaptive query optimization means that value distributions and selectives, which play the most significant role in query plan evaluation, are continuously refined to reflect the actual values as opposed to static ones that are computed off-line. Query feedback is a concept that was first introduced to the literature by our group. We employed query feedback for both adaptive buffer management and for computing value distributions and selectivities. For adaptive buffer management, we use the page faults of prior executions to achieve more 'informed' management decisions. For the estimation of the distributions of the selectivities, we use curve-fitting techniques, such as least squares and splines, for regressing on these values

    Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes

    Full text link
    Full-text search engines are important tools for information retrieval. Term proximity is an important factor in relevance score measurement. In a proximity full-text search, we assume that a relevant document contains query terms near each other, especially if the query terms are frequently occurring words. A methodology for high-performance full-text query execution is discussed. We build additional indexes to achieve better efficiency. For a word that occurs in the text, we include in the indexes some information about nearby words. What types of additional indexes do we use? How do we use them? These questions are discussed in this work. We present the results of experiments showing that the average time of search query execution is 44-45 times less than that required when using ordinary inverted indexes. This is a pre-print of a contribution "Veretennikov A.B. Proximity Full-Text Search with a Response Time Guarantee by Means of Additional Indexes" published in "Arai K., Kapoor S., Bhatia R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868" published by Springer, Cham. The final authenticated version is available online at: https://doi.org/10.1007/978-3-030-01054-6_66. The work was supported by Act 211 Government of the Russian Federation, contract no 02.A03.21.0006.Comment: Alexander B. Veretennikov. Chair of Calculation Mathematics and Computer Science, INSM. Ural Federal Universit

    A graphical environment for change detection in structured documents

    Get PDF
    Change detection in structured documents (e.g. SGML is important in data warehousing, digital libraries and Internet databases. This thesis presents a graphical environment for detecting changes in the structured documents. We represent. each document by alp ordered labeled tree based on the underlying markup language. We then compare two documents by invoking previously developed algorithms for approximate pattern matching and pattern discovery in trees. Several operators are developed to support. the comparison of the documents; graphical devices are provided to facilitate the use of the operators. We believe the proposed tool is useful for not only document management, but also software maintenance, particularly configuration management and version control, where programs aro represented as parse trees and detecting changes in the trees provides a way to find the syntactic differences of two program versions

    An architecture and methodology for the design and development of Technical Information Systems

    Get PDF
    In order to meet demands in the context of Technical Information Systems (TIS) pertaining to reliability, extensibility, maintainability, etc., we have developed an architectural framework with accompanying methodological guidelines for designing such systems. With the framework, we aim at complex multiapplication information systems using a repository to share data among applications. The framework proposes to keep a strict separation between Man-Machine-Interface and Model data, and provides design and implementation support to do this effectively.\ud The framework and methodological guidelines have been developed in the context of the ESPRIT project IMPRESS. The project also provided for ldquotesting groundsrdquo in the form of a TIS for the Spanish Electricity company Iberdrola.\ud This work has been conducted within the ESPRIT project IMPRESS (Integrated, Multi-Paradigm, Reliable and Extensible Storage System), ESPRIT No. 635

    Distribution Policies for Datalog

    Get PDF
    Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds
    corecore