17,676 research outputs found

    TopSig: Topology Preserving Document Signatures

    Get PDF
    Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

    FĂ­schlĂĄr-DiamondTouch: collaborative video searching on a table

    Get PDF
    In this paper we present the system we have developed for our participation in the annual TRECVid benchmarking activity, specically the system we have developed, FĂ­schlĂĄr-DT, for participation in the interactive search task of TRECVid 2005. Our back-end search engine uses a combination of a text search which operates over the automatic speech recognised text, and an image search which uses low-level image features matched against video keyframes. The two novel aspects of our work are the fact that we are evaluating collaborative, team-based search among groups of users working together, and that we are using a novel touch-sensitive tabletop interface and interaction device known as the DiamondTouch to support this collaborative search. The paper summarises the backend search systems as well as presenting the interface we have developed, in detail

    On An Improved Parallel Construction Of Suffix Arrays For Low Bandwidth Pc-Cluster.

    Get PDF
    An algorithm for the parallel construction of suffix arrays generation for any texts with larger alphabet size on distributed memory architecture is presente

    BRAHMS: Novel middleware for integrated systems computation

    Get PDF
    Biological computational modellers are becoming increasingly interested in building large, eclectic models, including components on many different computational substrates, both biological and non-biological. At the same time, the rise of the philosophy of embodied modelling is generating a need to deploy biological models as controllers for robots in real-world environments. Finally, robotics engineers are beginning to find value in seconding biomimetic control strategies for use on practical robots. Together with the ubiquitous desire to make good on past software development effort, these trends are throwing up new challenges of intellectual and technological integration (for example across scales, across disciplines, and even across time) - challenges that are unmet by existing software frameworks. Here, we outline these challenges in detail, and go on to describe a newly developed software framework, BRAHMS. that meets them. BRAHMS is a tool for integrating computational process modules into a viable, computable system: its generality and flexibility facilitate integration across barriers, such as those described above, in a coherent and effective way. We go on to describe several cases where BRAHMS has been successfully deployed in practical situations. We also show excellent performance in comparison with a monolithic development approach. Additional benefits of developing in the framework include source code self-documentation, automatic coarse-grained parallelisation, cross-language integration, data logging, performance monitoring, and will include dynamic load-balancing and 'pause and continue' execution. BRAHMS is built on the nascent, and similarly general purpose, model markup language, SystemML. This will, in future, also facilitate repeatability and accountability (same answers ten years from now), transparent automatic software distribution, and interfacing with other SystemML tools. (C) 2009 Elsevier Ltd. All rights reserved

    Medical image retrieval and automatic annotation: VPA-SABANCI at ImageCLEF 2009

    Get PDF
    Advances in the medical imaging technology has lead to an exponential growth in the number of digital images that needs to be acquired, analyzed, classified, stored and retrieved in medical centers. As a result, medical image classification and retrieval has recently gained high interest in the scientific community. Despite several attempts, such as the yearly-held ImageCLEF Medical Image Annotation Competition, the proposed solutions are still far from being su±ciently accurate for real-life implementations. In this paper we summarize the technical details of our experiments for the ImageCLEF 2009 medical image annotation task. We use a direct and two hierarchical classification schemes that employ support vector machines and local binary patterns, which are recently developed low-cost texture descriptors. The direct scheme employs a single SVM to automatically annotate X-ray images. The two proposed hierarchi-cal schemes divide the classification task into sub-problems. The first hierarchical scheme exploits ensemble SVMs trained on IRMA sub-codes. The second learns from subgroups of data defined by frequency of classes. Our experiments show that hier-archical annotation of images by training individual SVMs over each IRMA sub-code dominates its rivals in annotation accuracy with increased process time relative to the direct scheme

    Exhaustive and Efficient Constraint Propagation: A Semi-Supervised Learning Perspective and Its Applications

    Full text link
    This paper presents a novel pairwise constraint propagation approach by decomposing the challenging constraint propagation problem into a set of independent semi-supervised learning subproblems which can be solved in quadratic time using label propagation based on k-nearest neighbor graphs. Considering that this time cost is proportional to the number of all possible pairwise constraints, our approach actually provides an efficient solution for exhaustively propagating pairwise constraints throughout the entire dataset. The resulting exhaustive set of propagated pairwise constraints are further used to adjust the similarity matrix for constrained spectral clustering. Other than the traditional constraint propagation on single-source data, our approach is also extended to more challenging constraint propagation on multi-source data where each pairwise constraint is defined over a pair of data points from different sources. This multi-source constraint propagation has an important application to cross-modal multimedia retrieval. Extensive results have shown the superior performance of our approach.Comment: The short version of this paper appears as oral paper in ECCV 201

    Toward Entity-Aware Search

    Get PDF
    As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability
    • 

    corecore