Search CORE

298 research outputs found

An LSH Index for Computing Kendall's Tau over Top-k Lists

Author: Michel Sebastian
Pal Koninika
Publication venue
Publication date: 01/01/2014
Field of study

We consider the problem of similarity search within a set of top-k lists under the Kendall's Tau distance function. This distance describes how related two rankings are in terms of concordantly and discordantly ordered items. As top-k lists are usually very short compared to the global domain of possible items to be ranked, creating an inverted index to look up overlapping lists is possible but does not capture tight enough the similarity measure. In this work, we investigate locality sensitive hashing schemes for the Kendall's Tau distance and evaluate the proposed methods using two real-world datasets.Comment: 6 pages, 8 subfigures, presented in Seventeenth International Workshop on the Web and Databases (WebDB 2014) co-located with ACM SIGMOD201

arXiv.org e-Print Archive

MPG.PuRe

EquiX---A Search and Query Language for XML

Author: Abiteboul
Bar-Yossef
Baru
Bradley
Bray
Bray
Cohen
Cohen
Cormen
Deutsch
Fallside
Goldman
Goldman
Kanza
Ludäscher
Papakonstantinou
Robie
Publication venue
Publication date: 01/01/2000
Field of study

EquiX is a search language for XML that combines the power of querying with the simplicity of searching. Requirements for such languages are discussed and it is shown that EquiX meets the necessary criteria. Both a graphical abstract syntax and a formal concrete syntax are presented for EquiX queries. In addition, the semantics is defined and an evaluation algorithm is presented. The evaluation algorithm is polynomial under combined complexity. EquiX combines pattern matching, quantification and logical expressions to query both the data and meta-data of XML documents. The result of a query in EquiX is a set of XML documents. A DTD describing the result documents is derived automatically from the query.Comment: technical report of Hebrew University Jerusalem Israe

arXiv.org e-Print Archive

Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia

Author: Caldarelli G.
Capocci A.
Rao F.
Publication venue: 'IOP Publishing'
Publication date: 16/10/2007
Field of study

In this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia. We find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless the statistically similar behaviour the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method.Comment: 5 pages, 3 figures, epl2 styl

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

IMT Institutional Repository

A Look Back on the XML Benchmark Project

Author: Kersten M.L. (Martin)
Manegold S. (Stefan)
Schmidt A.R.
Waas F.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2003
Field of study

The XML Benchmark Project was started to provide a framework for evaluating the interplay of XML technologies and Database Management Systems. The benchmark lays emphasis on engineering aspects as well as on performance of the query processor. In this chapter the authors present a quick overview of the benchmark and point at some of the experience they gathered during the design of the benchmark and while running it on a variety of platforms. Since the benchmark was designed early in the evolution of XML, our experiences also reflect how the perception of XML changed during the three years that have passed since we started working on the subject. The chapter comprises an overview of the benchmark as well as discussions of some lessons learned

CWI's Institutional Repository

ENTITY EXTRACTION USING STATISTICAL METHODS USING INTERACTIVE KNOWLEDGE MINING FRAMEWORK

Author: AHAMED SHAIK MUNEEB
AHMAD SD.AFZAL
BABU P.
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 05/08/2020
Field of study

There are various kinds of valuable semantic information about real-world entities embedded in web pages and databases. Extracting and integrating these entity information from the Web is of great significance. Comparing to traditional information extraction problems, web entity extraction needs to solve several new challenges to fully take advantage of the unique characteristic of the Web. In this paper, we introduce our recent work on statistical extraction of structured entities, named entities, entity facts and relations from Web. We also briefly introduce iKnoweb, an interactive knowledge mining framework for entity information integration. We will use two novel web applications, Microsoft Academic Search (aka Libra) and EntityCube, as working examples

Interscience Research Network