298 research outputs found

    An LSH Index for Computing Kendall's Tau over Top-k Lists

    Full text link
    We consider the problem of similarity search within a set of top-k lists under the Kendall's Tau distance function. This distance describes how related two rankings are in terms of concordantly and discordantly ordered items. As top-k lists are usually very short compared to the global domain of possible items to be ranked, creating an inverted index to look up overlapping lists is possible but does not capture tight enough the similarity measure. In this work, we investigate locality sensitive hashing schemes for the Kendall's Tau distance and evaluate the proposed methods using two real-world datasets.Comment: 6 pages, 8 subfigures, presented in Seventeenth International Workshop on the Web and Databases (WebDB 2014) co-located with ACM SIGMOD201

    EquiX---A Search and Query Language for XML

    Full text link
    EquiX is a search language for XML that combines the power of querying with the simplicity of searching. Requirements for such languages are discussed and it is shown that EquiX meets the necessary criteria. Both a graphical abstract syntax and a formal concrete syntax are presented for EquiX queries. In addition, the semantics is defined and an evaluation algorithm is presented. The evaluation algorithm is polynomial under combined complexity. EquiX combines pattern matching, quantification and logical expressions to query both the data and meta-data of XML documents. The result of a query in EquiX is a set of XML documents. A DTD describing the result documents is derived automatically from the query.Comment: technical report of Hebrew University Jerusalem Israe

    Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia

    Full text link
    In this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia. We find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless the statistically similar behaviour the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method.Comment: 5 pages, 3 figures, epl2 styl

    A Look Back on the XML Benchmark Project

    Get PDF
    The XML Benchmark Project was started to provide a framework for evaluating the interplay of XML technologies and Database Management Systems. The benchmark lays emphasis on engineering aspects as well as on performance of the query processor. In this chapter the authors present a quick overview of the benchmark and point at some of the experience they gathered during the design of the benchmark and while running it on a variety of platforms. Since the benchmark was designed early in the evolution of XML, our experiences also reflect how the perception of XML changed during the three years that have passed since we started working on the subject. The chapter comprises an overview of the benchmark as well as discussions of some lessons learned

    ENTITY EXTRACTION USING STATISTICAL METHODS USING INTERACTIVE KNOWLEDGE MINING FRAMEWORK

    Get PDF
    There are various kinds of valuable semantic information about real-world entities embedded in web pages and databases. Extracting and integrating these entity information from the Web is of great significance. Comparing to traditional information extraction problems, web entity extraction needs to solve several new challenges to fully take advantage of the unique characteristic of the Web. In this paper, we introduce our recent work on statistical extraction of structured entities, named entities, entity facts and relations from Web. We also briefly introduce iKnoweb, an interactive knowledge mining framework for entity information integration. We will use two novel web applications, Microsoft Academic Search (aka Libra) and EntityCube, as working examples
    • …
    corecore