38,692 research outputs found

    TopSig: Topology Preserving Document Signatures

    Get PDF
    Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

    Analyze Large Multidimensional Datasets Using Algebraic Topology

    Get PDF
    This paper presents an efficient algorithm to extract knowledge from high-dimensionality, high- complexity datasets using algebraic topology, namely simplicial complexes. Based on concept of isomorphism of relations, our method turn a relational table into a geometric object (a simplicial complex is a polyhedron). So, conceptually association rule searching is turned into a geometric traversal problem. By leveraging on the core concepts behind Simplicial Complex, we use a new technique (in computer science) that improves the performance over existing methods and uses far less memory. It was designed and developed with a strong emphasis on scalability, reliability, and extensibility. This paper also investigate the possibility of Hadoop integration and the challenges that come with the framework

    A practical guide to computer simulations

    Full text link
    Here practical aspects of conducting research via computer simulations are discussed. The following issues are addressed: software engineering, object-oriented software development, programming style, macros, make files, scripts, libraries, random numbers, testing, debugging, data plotting, curve fitting, finite-size scaling, information retrieval, and preparing presentations. Because of the limited space, usually only short introductions to the specific areas are given and references to more extensive literature are cited. All examples of code are in C/C++.Comment: 69 pages, with permission of Wiley-VCH, see http://www.wiley-vch.de (some screenshots with poor quality due to arXiv size restrictions) A comprehensively extended version will appear in spring 2009 as book at Word-Scientific, see http://www.worldscibooks.com/physics/6988.htm

    Bayes Merging of Multiple Vocabularies for Scalable Image Retrieval

    Full text link
    The Bag-of-Words (BoW) representation is well applied to recent state-of-the-art image retrieval works. Typically, multiple vocabularies are generated to correct quantization artifacts and improve recall. However, this routine is corrupted by vocabulary correlation, i.e., overlapping among different vocabularies. Vocabulary correlation leads to an over-counting of the indexed features in the overlapped area, or the intersection set, thus compromising the retrieval accuracy. In order to address the correlation problem while preserve the benefit of high recall, this paper proposes a Bayes merging approach to down-weight the indexed features in the intersection set. Through explicitly modeling the correlation problem in a probabilistic view, a joint similarity on both image- and feature-level is estimated for the indexed features in the intersection set. We evaluate our method through extensive experiments on three benchmark datasets. Albeit simple, Bayes merging can be well applied in various merging tasks, and consistently improves the baselines on multi-vocabulary merging. Moreover, Bayes merging is efficient in terms of both time and memory cost, and yields competitive performance compared with the state-of-the-art methods.Comment: 8 pages, 7 figures, 6 tables, accepted to CVPR 201

    A Short Travel for Neutrinos in Large Extra Dimensions

    Full text link
    Neutrino oscillations successfully explain the flavor transitions observed in neutrinos produced in natural sources like the center of the sun and the earth atmosphere, and also from man-made sources like reactors and accelerators. These oscillations are driven by two mass-squared differences, solar and atmospheric, at the sub-eV scale. However, longstanding anomalies at short-baselines might imply the existence of new oscillation frequencies at the eV-scale and the possibility of this sterile state(s) to mix with the three active neutrinos. One of the many future neutrino programs that are expected to provide a final word on this issue is the Short-Baseline Neutrino Program (SBN) at FERMILAB. In this letter, we consider a specific model of Large Extra Dimensions (LED) which provides interesting signatures of oscillation of extra sterile states. We started re-creating sensitivity analyses for sterile neutrinos in the 3+1 scenario, previously done by the SBN collaboration, by simulating neutrino events in the three SBN detectors from both muon neutrino disappearance and electron neutrino appearance. Then, we implemented neutrino oscillations as predicted in the LED model and also we have performed sensitivity analysis to the LED parameters. Finally, we studied the SBN power of discriminating between the two models, the 3+1 and the LED. We have found that SBN is sensitive to the oscillations predicted in the LED model and have the potential to constrain the LED parameter space better than any other oscillation experiment, for m1D<0.1eVm_{1}^D<0.1\,\text{eV}. In case SBN observes a departure from the three active neutrino framework, it also has the power of discriminating between sterile oscillations predicted in the 3+1 framework and the LED ones.Comment: 21 pages, 6 figures, 2 table

    Exploiting boundary states of imperfect spin chains for high-fidelity state transfer

    Full text link
    We study transfer of a quantum state through XX spin chains with static imperfections. We combine the two standard approaches for state transfer based on (i) modulated couplings between neighboring spins throughout the spin chain and (ii) weak coupling of the outermost spins to an unmodulated spin chain. The combined approach allows us to design spin chains with modulated couplings and localized boundary states, permitting high-fidelity state transfer in the presence of random static imperfections of the couplings. The modulated couplings are explicitly obtained from an exact algorithm using the close relation between tridiagonal matrices and orthogonal polynomials [Linear Algebr. Appl. 21, 245 (1978)]. The implemented algorithm and a graphical user interface for constructing spin chains with boundary states (spinGUIn) are provided as Supplemental Material.Comment: 7 pages, 3 figures + spinGUIn description and Matlab files iepsolve.m, spinGUIn.fig, spinGUIn.
    corecore