Search CORE

67 research outputs found

Query-driven indexing in large-scale distributed systems

Author: Skobeltsyn Gleb
Publication venue: Lausanne, EPFL
Publication date: 06/11/2008
Field of study

Efficient and effective search in large-scale data repositories requires complex indexing solutions deployed on a large number of servers. Web search engines such as Google and Yahoo! already rely upon complex systems to be able to return relevant query results and keep processing times within the comfortable sub-second limit. Nevertheless, the exponential growth of the amount of content on the Web poses serious challenges with respect to scalability. Coping with these challenges requires novel indexing solutions that not only remain scalable but also preserve the search accuracy. In this thesis we introduce and explore the concept of query-driven indexing – an index construction strategy that uses caching techniques to adapt to the querying patterns expressed by users. We suggest to abandon the strict difference between indexing and caching, and to build a distributed indexing structure, or a distributed cache, such that it is optimized for the current query load. Our experimental and theoretical analysis shows that employing query-driven indexing is especially beneficial when the content is (geographically) distributed in a Peer-to-Peer network. In such a setting extensive bandwidth consumption has been identified as one of the major obstacles for efficient large-scale search. Our indexing mechanisms combat this problem by maintaining the query popularity statistics and by indexing (caching) intermediate query results that are requested frequently. We present several indexing strategies for processing multi-keyword and XPath queries over distributed collections of textual and XML documents respectively. Experimental evaluations show significant overall traffic reduction compared to the state-of-the-art approaches. We also study possible query-driven optimizations for Web search engine architectures. Contrary to the Peer-to-Peer setting, Web search engines use centralized caching of query results to reduce the processing load on the main index. We analyze real search engine query logs and show that the changes in query traffic that such a results cache induces fundamentally affect indexing performance. In particular, we study its impact on index pruning efficiency. We show that combination of both techniques enables efficient reduction of the query processing costs and thus is practical to use in Web search engines

Infoscience - École polytechnique fédérale de Lausanne

A Formal Concept Analysis Approach to Association Rule Mining: The QuICL Algorithms

Author: Smith David T.
Publication venue: NSUWorks
Publication date: 01/01/2009
Field of study

Association rule mining (ARM) is the task of identifying meaningful implication rules exhibited in a data set. Most research has focused on extracting frequent item (FI) sets and thus fallen short of the overall ARM objective. The FI miners fail to identify the upper covers that are needed to generate a set of association rules whose size can be exploited by an end user. An alternative to FI mining can be found in formal concept analysis (FCA), a branch of applied mathematics. FCA derives a concept lattice whose concepts identify closed FI sets and connections identify the upper covers. However, most FCA algorithms construct a complete lattice and therefore include item sets that are not frequent. An iceberg lattice, on the other hand, is a concept lattice whose concepts contain only FI sets. Only three algorithms to construct an iceberg lattice were found in literature. Given that an iceberg concept lattice provides an analysis tool to succinctly identify association rules, this study investigated additional algorithms to construct an iceberg concept lattice. This report presents the development and analysis of the Quick Iceberg Concept Lattice (QuICL) algorithms. These algorithms provide incremental construction of an iceberg lattice. QuICL uses recursion instead of iteration to navigate the lattice and establish connections, thereby eliminating costly processing incurred by past algorithms. The QuICL algorithms were evaluated against leading FI miners and FCA construction algorithms using benchmarks cited in literature. Results demonstrate that QuICL provides performance on the order of FI miners yet additionally derive the upper covers. QuICL, when combined with known algorithms to extract a basis of association rules from a lattice, offer a best known ARM solution. Beyond this, the QuICL algorithms have proved to be very efficient, providing an order of magnitude gains over other incremental lattice construction algorithms. For example, on the Mushroom data set, QuICL completes in less than 3 seconds. Past algorithms exceed 200 seconds. On T10I4D100k, QuICL completes in less than 120 seconds. Past algorithms approach 10,000 seconds. QuICL is proved to be the best known all around incremental lattice construction algorithm. Runtime complexity is shown to be O(l d i) where l is the cardinality of the lattice, d is the average degree of the lattice, and i is a mean function on the frequent item extents

NSU Works

Multithreaded Tabling for Logic Programming

Author: Miguel João Gonçalves Areias
Publication venue
Publication date: 08/05/2015
Field of study

Repositório Aberto da Universidade do Porto

Building Logic Toolboxes

Author: Heguiabehere J.M.
Publication venue
Publication date: 04/12/2003
Field of study

CWI's Institutional Repository

Reasoning in description logics using resolution and deductive databases

Author: Motik Boris
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2006
Field of study

KITopen

How To Efficiently Implement An OSHL-Based Automatic Theorem Prover

Author: Xu Hao
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/01/2012
Field of study

Ordered Semantic Hyper-linking (OSHL) is a general-purpose instance-based first-order automated theorem proving algorithm. Although OSHL has many useful properties, previous implementations of OSHL were not very efficient. The implementation of such a theorem prover differs from other more traditional programs in that a lot of its subroutines are more mathematical than procedural. The low performance of previous implementations prevents us from evaluating how the proof strategy used in OSHL matches up against other theorem proving strategies. This dissertation addresses this problem on three levels. First, an abstract, generalized version genOSHL is defined which captures the essential features of OSHL and for which the soundness and completeness are proved. This gives genOSHL the flexibility to be tweaked while still preserving soundness and completeness. A type inference algorithm is introduced which allows genOSHL to possibly reduce its search space while still preserving the soundness and completeness. Second, incOSHL, a specialized version of genOSHL, which differs from the original OSHL algorithm, is defined by specializing genOSHL. Its soundness of completeness follows from that of genOSHL. Third, an embedded programming language called STACK EL, which allows managing program states and their dependencies on global mutable data, is designed and implemented. STACK EL allows our prover to generate instances incrementally. We also study the performance of our incremental theorem prover that implements incOSHL.Doctor of Philosoph

Carolina Digital Repository