11,024 research outputs found
On the probabilistic logical modelling of quantum and geometrically-inspired IR
Information Retrieval approaches can mostly be classed into probabilistic, geometric or logic-based. Recently, a new unifying framework for IR has emerged that integrates a probabilistic description within a geometric framework, namely vectors in Hilbert spaces. The geometric model leads naturally to a predicate logic over linear subspaces, also known as quantum logic. In this paper we show the relation between this model and classic concepts such as the Generalised Vector Space Model, highlighting similarities and differences. We also show how some fundamental components of quantum-based IR can be modelled in a descriptive way using a well-established tool, i.e. Probabilistic Datalog
Strengths and Weaknesses of Quantum Computing
Recently a great deal of attention has focused on quantum computation
following a sequence of results suggesting that quantum computers are more
powerful than classical probabilistic computers. Following Shor's result that
factoring and the extraction of discrete logarithms are both solvable in
quantum polynomial time, it is natural to ask whether all of NP can be
efficiently solved in quantum polynomial time. In this paper, we address this
question by proving that relative to an oracle chosen uniformly at random, with
probability 1, the class NP cannot be solved on a quantum Turing machine in
time . We also show that relative to a permutation oracle chosen
uniformly at random, with probability 1, the class cannot be
solved on a quantum Turing machine in time . The former bound is
tight since recent work of Grover shows how to accept the class NP relative to
any oracle on a quantum computer in time .Comment: 18 pages, latex, no figures, to appear in SIAM Journal on Computing
(special issue on quantum computing
Crosslingual Document Embedding as Reduced-Rank Ridge Regression
There has recently been much interest in extending vector-based word
representations to multiple languages, such that words can be compared across
languages. In this paper, we shift the focus from words to documents and
introduce a method for embedding documents written in any language into a
single, language-independent vector space. For training, our approach leverages
a multilingual corpus where the same concept is covered in multiple languages
(but not necessarily via exact translations), such as Wikipedia. Our method,
Cr5 (Crosslingual reduced-rank ridge regression), starts by training a
ridge-regression-based classifier that uses language-specific bag-of-word
features in order to predict the concept that a given document is about. We
show that, when constraining the learned weight matrix to be of low rank, it
can be factored to obtain the desired mappings from language-specific
bags-of-words to language-independent embeddings. As opposed to most prior
methods, which use pretrained monolingual word vectors, postprocess them to
make them crosslingual, and finally average word vectors to obtain document
vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as
document-level. Moreover, since our algorithm uses the singular value
decomposition as its core operation, it is highly scalable. Experiments show
that our method achieves state-of-the-art performance on a crosslingual
document retrieval task. Finally, although not trained for embedding sentences
and words, it also achieves competitive performance on crosslingual sentence
and word retrieval tasks.Comment: In The Twelfth ACM International Conference on Web Search and Data
Mining (WSDM '19
A first step to accelerating fingerprint matching based on deformable minutiae clustering
Fingerprint recognition is one of the most used biometric
methods for authentication. The identification of a query fingerprint requires
matching its minutiae against every minutiae of all the fingerprints
of the database. The state-of-the-art matching algorithms are costly, from
a computational point of view, and inefficient on large datasets. In this
work, we include faster methods to accelerating DMC (the most accurate
fingerprint matching algorithm based only on minutiae). In particular,
we translate into C++ the functions of the algorithm which represent the
most costly tasks of the code; we create a library with the new code and
we link the library to the original C# code using a CLR Class Library
project by means of a C++/CLI Wrapper. Our solution re-implements
critical functions, e.g., the bit population count including a fast C++
PopCount library and the use of the squared Euclidean distance for calculating
the minutiae neighborhood. The experimental results show a
significant reduction of the execution time in the optimized functions of
the matching algorithm. Finally, a novel approach to improve the matching
algorithm, considering cache memory blocking and parallel data processing,
is presented as future work.Universidad de Málaga. Campus de Excelencia Internacional AndalucÃa Tech
Ridge Regression, Hubness, and Zero-Shot Learning
This paper discusses the effect of hubness in zero-shot learning, when ridge
regression is used to find a mapping between the example space to the label
space. Contrary to the existing approach, which attempts to find a mapping from
the example space to the label space, we show that mapping labels into the
example space is desirable to suppress the emergence of hubs in the subsequent
nearest neighbor search step. Assuming a simple data model, we prove that the
proposed approach indeed reduces hubness. This was verified empirically on the
tasks of bilingual lexicon extraction and image labeling: hubness was reduced
with both of these tasks and the accuracy was improved accordingly.Comment: To be presented at ECML/PKDD 201
- …