67 research outputs found

    Query-driven indexing in large-scale distributed systems

    Get PDF
    Efficient and effective search in large-scale data repositories requires complex indexing solutions deployed on a large number of servers. Web search engines such as Google and Yahoo! already rely upon complex systems to be able to return relevant query results and keep processing times within the comfortable sub-second limit. Nevertheless, the exponential growth of the amount of content on the Web poses serious challenges with respect to scalability. Coping with these challenges requires novel indexing solutions that not only remain scalable but also preserve the search accuracy. In this thesis we introduce and explore the concept of query-driven indexing – an index construction strategy that uses caching techniques to adapt to the querying patterns expressed by users. We suggest to abandon the strict difference between indexing and caching, and to build a distributed indexing structure, or a distributed cache, such that it is optimized for the current query load. Our experimental and theoretical analysis shows that employing query-driven indexing is especially beneficial when the content is (geographically) distributed in a Peer-to-Peer network. In such a setting extensive bandwidth consumption has been identified as one of the major obstacles for efficient large-scale search. Our indexing mechanisms combat this problem by maintaining the query popularity statistics and by indexing (caching) intermediate query results that are requested frequently. We present several indexing strategies for processing multi-keyword and XPath queries over distributed collections of textual and XML documents respectively. Experimental evaluations show significant overall traffic reduction compared to the state-of-the-art approaches. We also study possible query-driven optimizations for Web search engine architectures. Contrary to the Peer-to-Peer setting, Web search engines use centralized caching of query results to reduce the processing load on the main index. We analyze real search engine query logs and show that the changes in query traffic that such a results cache induces fundamentally affect indexing performance. In particular, we study its impact on index pruning efficiency. We show that combination of both techniques enables efficient reduction of the query processing costs and thus is practical to use in Web search engines

    A Formal Concept Analysis Approach to Association Rule Mining: The QuICL Algorithms

    Get PDF
    Association rule mining (ARM) is the task of identifying meaningful implication rules exhibited in a data set. Most research has focused on extracting frequent item (FI) sets and thus fallen short of the overall ARM objective. The FI miners fail to identify the upper covers that are needed to generate a set of association rules whose size can be exploited by an end user. An alternative to FI mining can be found in formal concept analysis (FCA), a branch of applied mathematics. FCA derives a concept lattice whose concepts identify closed FI sets and connections identify the upper covers. However, most FCA algorithms construct a complete lattice and therefore include item sets that are not frequent. An iceberg lattice, on the other hand, is a concept lattice whose concepts contain only FI sets. Only three algorithms to construct an iceberg lattice were found in literature. Given that an iceberg concept lattice provides an analysis tool to succinctly identify association rules, this study investigated additional algorithms to construct an iceberg concept lattice. This report presents the development and analysis of the Quick Iceberg Concept Lattice (QuICL) algorithms. These algorithms provide incremental construction of an iceberg lattice. QuICL uses recursion instead of iteration to navigate the lattice and establish connections, thereby eliminating costly processing incurred by past algorithms. The QuICL algorithms were evaluated against leading FI miners and FCA construction algorithms using benchmarks cited in literature. Results demonstrate that QuICL provides performance on the order of FI miners yet additionally derive the upper covers. QuICL, when combined with known algorithms to extract a basis of association rules from a lattice, offer a best known ARM solution. Beyond this, the QuICL algorithms have proved to be very efficient, providing an order of magnitude gains over other incremental lattice construction algorithms. For example, on the Mushroom data set, QuICL completes in less than 3 seconds. Past algorithms exceed 200 seconds. On T10I4D100k, QuICL completes in less than 120 seconds. Past algorithms approach 10,000 seconds. QuICL is proved to be the best known all around incremental lattice construction algorithm. Runtime complexity is shown to be O(l d i) where l is the cardinality of the lattice, d is the average degree of the lattice, and i is a mean function on the frequent item extents

    Building Logic Toolboxes

    Get PDF

    Reasoning in description logics using resolution and deductive databases

    Get PDF

    How To Efficiently Implement An OSHL-Based Automatic Theorem Prover

    Get PDF
    Ordered Semantic Hyper-linking (OSHL) is a general-purpose instance-based first-order automated theorem proving algorithm. Although OSHL has many useful properties, previous implementations of OSHL were not very efficient. The implementation of such a theorem prover differs from other more traditional programs in that a lot of its subroutines are more mathematical than procedural. The low performance of previous implementations prevents us from evaluating how the proof strategy used in OSHL matches up against other theorem proving strategies. This dissertation addresses this problem on three levels. First, an abstract, generalized version genOSHL is defined which captures the essential features of OSHL and for which the soundness and completeness are proved. This gives genOSHL the flexibility to be tweaked while still preserving soundness and completeness. A type inference algorithm is introduced which allows genOSHL to possibly reduce its search space while still preserving the soundness and completeness. Second, incOSHL, a specialized version of genOSHL, which differs from the original OSHL algorithm, is defined by specializing genOSHL. Its soundness of completeness follows from that of genOSHL. Third, an embedded programming language called STACK EL, which allows managing program states and their dependencies on global mutable data, is designed and implemented. STACK EL allows our prover to generate instances incrementally. We also study the performance of our incremental theorem prover that implements incOSHL.Doctor of Philosoph

    On applying Or-Parallelism and Tabling to logic programs

    Get PDF
    Dissertação de Doutoramento em Ciência de Computadores apresentada à Faculdade de Ciências da Universidade do Port
    • …
    corecore