Search CORE

3,870 research outputs found

INEX 2007 Evaluation Measures

Author: Kamps Jaap
Kazai Gabriella
Lalmas Mounia
Ogilvie Paul
Pehcevski Jovan
Piwowarski Benjamin
Robertson Stephen
Publication venue: HAL CCSD
Publication date: 19/12/2007
Field of study

International audienceThis paper describes the official measures of retrieval effectiveness that are planned to be employed for the ad hoc track of INEX 2007

INRIA a CCSD electronic archive server

HAL Descartes

Use of Wikipedia Categories in Entity Ranking

Author: Pehcevski Jovan
Thom James A.
Vercoustre Anne-Marie
Publication venue
Publication date: 01/01/2007
Field of study

Wikipedia is a useful source of knowledge that has many applications in language processing and knowledge representation. The Wikipedia category graph can be compared with the class hierarchy in an ontology; it has some characteristics in common as well as some differences. In this paper, we present our approach for answering entity ranking queries from the Wikipedia. In particular, we explore how to make use of Wikipedia categories to improve entity ranking effectiveness. Our experiments show that using categories of example entities works significantly better than using loosely defined target categories

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

The State-of-the-arts in Focused Search

Author: Li Rongmei
Publication venue: University of Twente, Centre for Telematics and Information Technology
Publication date: 01/01/2009
Field of study

The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems

University of Twente Research Information

TopSig: Topology Preserving Document Signatures

Author: De Vries Christopher M.
Geva Shlomo
Publication venue
Publication date: 01/01/2011
Field of study

Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

XML documents clustering using a tensor space model

Author: Kutty Sangeetha
Li Yuefeng
Nayak Richi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information

CiteSeerX

Queensland University of Technology ePrints Archive

Document Clustering with K-tree

Author: De Vries Christopher M.
Geva Shlomo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.Comment: 12 pages, INEX 200

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus

Author: Sawant Uma
Garg Saurabh
Chakrabarti Soumen
Ramakrishnan Ganesh
Publication venue
Publication date: 06/12/2018
Field of study

In Web search, entity-seeking queries often trigger a special Question Answering (QA) system. It may use a parser to interpret the question to a structured query, execute that on a knowledge graph (KG), and return direct entity responses. QA systems based on precise parsing tend to be brittle: minor syntax variations may dramatically change the response. Moreover, KG coverage is patchy. At the other extreme, a large corpus may provide broader coverage, but in an unstructured, unreliable form. We present AQQUCN, a QA system that gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of query syntax, between well-formed questions to short `telegraphic' keyword sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals from KGs and large corpora to directly rank KG entities, rather than commit to one semantic interpretation of the query. AQQUCN models the ideal interpretation as an unobservable or latent variable. Interpretations and candidate entity responses are scored as pairs, by combining signals from multiple convolutional networks that operate collectively on the query, KG and corpus. On four public query workloads, amounting to over 8,000 queries with diverse query syntax, we see 5--16% absolute improvement in mean average precision (MAP), compared to the entity ranking performance of recent systems. Our system is also competitive at entity set retrieval, almost doubling F1 scores for challenging short queries.Comment: Accepted to Information Retrieval Journa

arXiv.org e-Print Archive

Biblioteca Digital de la Comunidad de Madrid