Search CORE

114,913 research outputs found

TopSig: Topology Preserving Document Signatures

Author: De Vries Christopher M.
Geva Shlomo
Publication venue
Publication date: 01/01/2011
Field of study

Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

INFORMATION RETRIEVAL USING LATENT SEMANTIC INDEXING

Author: Jasminka Dobša
Publication venue: Faculty of Organization and Informatics University of Zagreb
Publication date: 01/01/2002
Field of study

Our capabilities for collecting and storing data of all kinds are greater then ever. On the other side analyzing, summarizing and extracting information from this data is harder than ever. That’s why there is a growing need for the fast and efficient algorithms for information retrieval.In this paper we present some mathematical models based on linear algebra used to extract the relevant documents for some subjects out of a large set of text document. This is a typical problem of a search engine on the World Wide Web. We use vector space model, which is based on literal matching of terms in the documents and the queries. The vector space model is implemented by creating the term-document matrix. Literal matching of terms does not necessarily retrieve all relevant documents. Synonymy (multiple words having the same meaning) and polysemy (words having multiple meaning) are two major obstacles for efficient information retrieval. Latent Semantic Indexing represents documents by approximations and tends to cluster documents on similar topics even if their term profiles are somewhat different. This approximate representation is accomplished using a low-rank singular value decomposition (SVD) approximation of the term-document matrix. In this paper we compare the precision of information retrieval for different ranks of SVD representation of term-document matrix

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Random Indexing K-tree

Author: De Vine Lance
De Vries Christopher M.
Geva Shlomo
Publication venue
Publication date: 01/01/2009
Field of study

Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are useful for managing a changing collection. Furthermore, it solves previous issues with sparse document vectors when using K-tree. The algorithms and data structures are defined, explained and motivated. Specific modifications to K-tree are made for use with RI. Experiments have been executed to measure quality. The results indicate that RI K-tree improves document cluster quality over the original K-tree algorithm.Comment: 8 pages, ADCS 2009; Hyperref and cleveref LaTeX packages conflicted. Removed clevere

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

The relationship between IR and multimedia databases

Author: Blanken H.M.
Vries A.P. de
Publication venue: British Computer Society (BCS)
Publication date: 01/01/1998
Field of study

Modern extensible database systems support multimedia data through ADTs. However, because of the problems with multimedia query formulation, this support is not sufficient.\ud \ud Multimedia querying requires an iterative search process involving many different representations of the objects in the database. The support that is needed is very similar to the processes in information retrieval.\ud \ud Based on this observation, we develop the miRRor architecture for multimedia query processing. We design a layered framework based on information retrieval techniques, to provide a usable query interface to the multimedia database.\ud \ud First, we introduce a concept layer to enable reasoning over low-level concepts in the database.\ud \ud Second, we add an evidential reasoning layer as an intermediate between the user and the concept layer.\ud \ud Third, we add the functionality to process the users' relevance feedback.\ud \ud We then adapt the inference network model from text retrieval to an evidential reasoning model for multimedia query processing.\ud \ud We conclude with an outline for implementation of miRRor on top of the Monet extensible database system

CiteSeerX

Crossref

CWI's Institutional Repository

University of Twente Research Information

Search Result Clustering via Randomized Partitioning of Query-Induced Subgraphs

Author: Bradic Aleksandar
Publication venue
Publication date: 25/11/2008
Field of study

In this paper, we present an approach to search result clustering, using partitioning of underlying link graph. We define the notion of "query-induced subgraph" and formulate the problem of search result clustering as a problem of efficient partitioning of given subgraph into topic-related clusters. Also, we propose a novel algorithm for approximative partitioning of such graph, which results in cluster quality comparable to the one obtained by deterministic algorithms, while operating in more efficient computation time, suitable for practical implementations. Finally, we present a practical clustering search engine developed as a part of this research and use it to get results about real-world performance of proposed concepts.Comment: 16th Telecommunications Forum TELFOR 200

arXiv.org e-Print Archive

Directory of Open Access Journals

Finding Your Literature Match -- A Recommender System

Author: Accomazzi Alberto
Bohlen Elizabeth
Di Milia Giovanni
Grant Carolyn
Henneken Edwin A.
Kurtz Michael J.
Luker Jay
Murray Stephen S.
Thompson Donna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/05/2010
Field of study

The universe of potentially interesting, searchable literature is expanding continuously. Besides the normal expansion, there is an additional influx of literature because of interdisciplinary boundaries becoming more and more diffuse. Hence, the need for accurate, efficient and intelligent search tools is bigger than ever. Even with a sophisticated search engine, looking for information can still result in overwhelming results. An overload of information has the intrinsic danger of scaring visitors away, and any organization, for-profit or not-for-profit, in the business of providing scholarly information wants to capture and keep the attention of its target audience. Publishers and search engine engineers alike will benefit from a service that is able to provide visitors with recommendations that closely meet their interests. Providing visitors with special deals, new options and highlights may be interesting to a certain degree, but what makes more sense (especially from a commercial point of view) than to let visitors do most of the work by the mere action of making choices? Hiring psychics is not an option, so a technological solution is needed to recommend items that a visitor is likely to be looking for. In this presentation we will introduce such a solution and argue that it is practically feasible to incorporate this approach into a useful addition to any information retrieval system with enough usage.Comment: Contribution to the proceedings of the colloquium Future Professional Communication in Astronomy II, 13-14 April 2010, Cambridge, Massachusetts. 11 pages, 4 figures

arXiv.org e-Print Archive

Crossref