15,195 research outputs found
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
Personalized Fuzzy Text Search Using Interest Prediction and Word Vectorization
In this paper we study the personalized text search problem. The keyword
based search method in conventional algorithms has a low efficiency in
understanding users' intention since the semantic meaning, user profile, user
interests are not always considered. Firstly, we propose a novel text search
algorithm using a inverse filtering mechanism that is very efficient for label
based item search. Secondly, we adopt the Bayesian network to implement the
user interest prediction for an improved personalized search. According to user
input, it searches the related items using keyword information, predicted user
interest. Thirdly, the word vectorization is used to discover potential targets
according to the semantic meaning. Experimental results show that the proposed
search engine has an improved efficiency and accuracy and it can operate on
embedded devices with very limited computational resources
Finding Support Documents with a Logistic Regression Approach
Entity retrieval finds the relevant results for a user’s information needs at a finer unit called “entity”. To retrieve such entity, people usually first locate a small set of support documents which contain answer entities, and then further detect the answer entities in this set. In the literature, people view the support documents as relevant documents, and their findings as a conventional document retrieval problem. In this paper, we will state that finding support documents and that of relevant documents, although sounds similar, have important differences. Further, we propose a logistic regression approach to find support documents. Our experiment results show that the logistic regression method performs significantly better than a baseline system that treat the support document finding as a conventional document retrieval problem
The SP theory of intelligence: benefits and applications
This article describes existing and expected benefits of the "SP theory of
intelligence", and some potential applications. The theory aims to simplify and
integrate ideas across artificial intelligence, mainstream computing, and human
perception and cognition, with information compression as a unifying theme. It
combines conceptual simplicity with descriptive and explanatory power across
several areas of computing and cognition. In the "SP machine" -- an expression
of the SP theory which is currently realized in the form of a computer model --
there is potential for an overall simplification of computing systems,
including software. The SP theory promises deeper insights and better solutions
in several areas of application including, most notably, unsupervised learning,
natural language processing, autonomous robots, computer vision, intelligent
databases, software engineering, information compression, medical diagnosis and
big data. There is also potential in areas such as the semantic web,
bioinformatics, structuring of documents, the detection of computer viruses,
data fusion, new kinds of computer, and the development of scientific theories.
The theory promises seamless integration of structures and functions within and
between different areas of application. The potential value, worldwide, of
these benefits and applications is at least $190 billion each year. Further
development would be facilitated by the creation of a high-parallel,
open-source version of the SP machine, available to researchers everywhere.Comment: arXiv admin note: substantial text overlap with arXiv:1212.022
- …