11,024 research outputs found

    On the probabilistic logical modelling of quantum and geometrically-inspired IR

    Get PDF
    Information Retrieval approaches can mostly be classed into probabilistic, geometric or logic-based. Recently, a new unifying framework for IR has emerged that integrates a probabilistic description within a geometric framework, namely vectors in Hilbert spaces. The geometric model leads naturally to a predicate logic over linear subspaces, also known as quantum logic. In this paper we show the relation between this model and classic concepts such as the Generalised Vector Space Model, highlighting similarities and differences. We also show how some fundamental components of quantum-based IR can be modelled in a descriptive way using a well-established tool, i.e. Probabilistic Datalog

    Strengths and Weaknesses of Quantum Computing

    Full text link
    Recently a great deal of attention has focused on quantum computation following a sequence of results suggesting that quantum computers are more powerful than classical probabilistic computers. Following Shor's result that factoring and the extraction of discrete logarithms are both solvable in quantum polynomial time, it is natural to ask whether all of NP can be efficiently solved in quantum polynomial time. In this paper, we address this question by proving that relative to an oracle chosen uniformly at random, with probability 1, the class NP cannot be solved on a quantum Turing machine in time o(2n/2)o(2^{n/2}). We also show that relative to a permutation oracle chosen uniformly at random, with probability 1, the class NP∩coNPNP \cap coNP cannot be solved on a quantum Turing machine in time o(2n/3)o(2^{n/3}). The former bound is tight since recent work of Grover shows how to accept the class NP relative to any oracle on a quantum computer in time O(2n/2)O(2^{n/2}).Comment: 18 pages, latex, no figures, to appear in SIAM Journal on Computing (special issue on quantum computing

    Crosslingual Document Embedding as Reduced-Rank Ridge Regression

    Get PDF
    There has recently been much interest in extending vector-based word representations to multiple languages, such that words can be compared across languages. In this paper, we shift the focus from words to documents and introduce a method for embedding documents written in any language into a single, language-independent vector space. For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia. Our method, Cr5 (Crosslingual reduced-rank ridge regression), starts by training a ridge-regression-based classifier that uses language-specific bag-of-word features in order to predict the concept that a given document is about. We show that, when constraining the learned weight matrix to be of low rank, it can be factored to obtain the desired mappings from language-specific bags-of-words to language-independent embeddings. As opposed to most prior methods, which use pretrained monolingual word vectors, postprocess them to make them crosslingual, and finally average word vectors to obtain document vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as document-level. Moreover, since our algorithm uses the singular value decomposition as its core operation, it is highly scalable. Experiments show that our method achieves state-of-the-art performance on a crosslingual document retrieval task. Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.Comment: In The Twelfth ACM International Conference on Web Search and Data Mining (WSDM '19

    A first step to accelerating fingerprint matching based on deformable minutiae clustering

    Get PDF
    Fingerprint recognition is one of the most used biometric methods for authentication. The identification of a query fingerprint requires matching its minutiae against every minutiae of all the fingerprints of the database. The state-of-the-art matching algorithms are costly, from a computational point of view, and inefficient on large datasets. In this work, we include faster methods to accelerating DMC (the most accurate fingerprint matching algorithm based only on minutiae). In particular, we translate into C++ the functions of the algorithm which represent the most costly tasks of the code; we create a library with the new code and we link the library to the original C# code using a CLR Class Library project by means of a C++/CLI Wrapper. Our solution re-implements critical functions, e.g., the bit population count including a fast C++ PopCount library and the use of the squared Euclidean distance for calculating the minutiae neighborhood. The experimental results show a significant reduction of the execution time in the optimized functions of the matching algorithm. Finally, a novel approach to improve the matching algorithm, considering cache memory blocking and parallel data processing, is presented as future work.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Ridge Regression, Hubness, and Zero-Shot Learning

    Full text link
    This paper discusses the effect of hubness in zero-shot learning, when ridge regression is used to find a mapping between the example space to the label space. Contrary to the existing approach, which attempts to find a mapping from the example space to the label space, we show that mapping labels into the example space is desirable to suppress the emergence of hubs in the subsequent nearest neighbor search step. Assuming a simple data model, we prove that the proposed approach indeed reduces hubness. This was verified empirically on the tasks of bilingual lexicon extraction and image labeling: hubness was reduced with both of these tasks and the accuracy was improved accordingly.Comment: To be presented at ECML/PKDD 201
    • …
    corecore