17,348 research outputs found
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
Efficient Spatial Keyword Search in Trajectory Databases
An increasing amount of trajectory data is being annotated with text
descriptions to better capture the semantics associated with locations. The
fusion of spatial locations and text descriptions in trajectories engenders a
new type of top- queries that take into account both aspects. Each
trajectory in consideration consists of a sequence of geo-spatial locations
associated with text descriptions. Given a user location and a
keyword set , a top- query returns trajectories whose text
descriptions cover the keywords and that have the shortest match
distance. To the best of our knowledge, previous research on querying
trajectory databases has focused on trajectory data without any text
description, and no existing work has studied such kind of top- queries on
trajectories. This paper proposes one novel method for efficiently computing
top- trajectories. The method is developed based on a new hybrid index,
cell-keyword conscious B-tree, denoted by \cellbtree, which enables us to
exploit both text relevance and location proximity to facilitate efficient and
effective query processing. The results of our extensive empirical studies with
an implementation of the proposed algorithms on BerkeleyDB demonstrate that our
proposed methods are capable of achieving excellent performance and good
scalability.Comment: 12 page
- …