3,295 research outputs found

    Signature file access methodologies for text retrieval: a literature review with additional test cases

    Get PDF
    Signature files are extremely compressed versions of text files which can be used as access or index files to facilitate searching documents for text strings. These access files, or signatures, are generated by storing hashed codes for individual words. Given the possible generation of similar codes in the hashing or storing process, the primary concern in researching signature files is to determine the accuracy of retrieving information. Inaccuracy is always represented by the false signaling of the presence of a text string. Two suggested ways to alter false drop rates are: 1) to determine if either of the two methologies for storing hashed codes, by superimposing them or by concatenating them, is more efficient; and 2) to determine if a particular hashing algorithm has any impact. To assess these issues, the history of suprimposed coding is traced from its development as a tool for compressing information onto punched cards in the 1950s to its incorporation into proposed signature file methodologies in the mid-1980\u27 s. Likewise, the concept of compressing individual words by various algorithms, or by hashing them is traced through the research literature. Following this literature review, benchmark trials are performed using both superimposed and concatenated methodologies while varying hashing algorithms. It is determined that while one combination of hashing algorithm and storage methodology is better, all signature file mehods can be considered viable

    Analysis of Multiterm Queries in Partitioned Signature File Environments

    Get PDF
    The concern of this study is the signature files which are used for information storage and retrieval in both formatted and unformatted databases. The analysis combines the concerns of signature extraction and signature file organization which have usually been treated as separate issues. Both the uniform frequency and single term query assumptions are relaxed and a comprehensive analysis is presented for multiterm query environments where terms can be classified based on their query and database occurrence frequencies. The performance of three superimposed signature generation schemes is explored as they are applied to a dynamic signature file organization based on linear hashing: Linear Hashing with Superimposed Signatures (LHSS). First scheme (SM) allows all terms set the same number of bits regardless of their discriminatory power whereas the second and third methods (MMS and MMM) emphasize the terms with high query and low database ooccurrence frequencies. Of these three schemes, only MMM takes the probability distribution of the number of query terms into account in finding the optimal mapping strategy. The main contribution of the study is the derivation of the performance evaluation formulas which is provided together with the analysis of various experimental settings. Results indicate that MMM outperforms the other methods as the gap between the discriminatory power of the terms gets larger. The absolute value of the savings provided by MMM reaches a maximum for the high query weight case. However, the extra savings decline sharply for high weight and moderately for the low weight queries with the increase in database size. The applicability of the derivations to other partitioned signature organizations is discussed and a detailed analysis of Fixed Prefix Partitioning (FPP) is provided as an example. An approximate formula that is shown to estimate the performance of both FPP and LHSS within an acceptable margin of error is also modified to account for the multiterm case

    Special Libraries, July-August 1959

    Get PDF
    Volume 50, Issue 6https://scholarworks.sjsu.edu/sla_sl_1959/1005/thumbnail.jp

    Automatic Analysis of Facial Expressions Based on Deep Covariance Trajectories

    Get PDF
    In this paper, we propose a new approach for facial expression recognition using deep covariance descriptors. The solution is based on the idea of encoding local and global Deep Convolutional Neural Network (DCNN) features extracted from still images, in compact local and global covariance descriptors. The space geometry of the covariance matrices is that of Symmetric Positive Definite (SPD) matrices. By conducting the classification of static facial expressions using Support Vector Machine (SVM) with a valid Gaussian kernel on the SPD manifold, we show that deep covariance descriptors are more effective than the standard classification with fully connected layers and softmax. Besides, we propose a completely new and original solution to model the temporal dynamic of facial expressions as deep trajectories on the SPD manifold. As an extension of the classification pipeline of covariance descriptors, we apply SVM with valid positive definite kernels derived from global alignment for deep covariance trajectories classification. By performing extensive experiments on the Oulu-CASIA, CK+, and SFEW datasets, we show that both the proposed static and dynamic approaches achieve state-of-the-art performance for facial expression recognition outperforming many recent approaches.Comment: A preliminary version of this work appeared in "Otberdout N, Kacem A, Daoudi M, Ballihi L, Berretti S. Deep Covariance Descriptors for Facial Expression Recognition, in British Machine Vision Conference 2018, BMVC 2018, Northumbria University, Newcastle, UK, September 3-6, 2018. ; 2018 :159." arXiv admin note: substantial text overlap with arXiv:1805.0386

    Analysis of Signature Generation Schemes for Multiterm Queries In Linear Hashing with Superimposed Signatures

    Get PDF
    Signature files provide efficient retrieval of data by reflecting the essence of the data objects into bit patterns. Our analysis explores the performance of three superimposed signature generation schemes as they are applied to a dynamic signature file organization based on linear hashing: Linear Hashing with Superimposed Signatures (LHSS). The first scheme (SM) allows all terms set the same number of bits whereas the second and third schemes (MMS aid MMM) emphasize the terms with high discriminatory power. In addition, MMM considers the probability distribution of the number of query terms. The main contribution of the study is a detailed analysis of LHSS in multiterm query environments by incorporating the term discrimination values based on document and query frequencies. The approach of the study can also be extended to other signature file access methods based on partitioning. The derivation of the performance evaluation formulas, the simulation results based on these formulas for various experimental settings, and the implementation results based on INSPEC and NPL text databases are provided. Results indicate that MMM and MMS outperform SM in all cases in terms of access savings, especially when terms become more distinctive. MMM slightly outperforms MMS in high weight and low weight query cases. The performance gap among all three schemes decreases as the database size increases, and as the signature size increases the performances of MMM and MMS decrease and converge to that of the SM scheme when the hashing level is fixed

    Analysis of Signature Generation Schemes for Multiterm Queries In Partitioned Signature File Environments

    Get PDF
    Our analysis explores the performance of three superimposed signature generation schemes as they are applied to a dynamic sigrtature file organization based on linear hashing: Linear Hashing with Superinzposed Signatures (LHSS). First scheme (SM) allows all terms set the same number of bits whereas the second and third methods (MMS and MMM) emphasize the terms with hlgh discriminatory power. In addition, M Mco nsiders the probaOiZity distribution of the number of query terms. The main contribution of the study is the combination of signature generation and signature file organization concepts together with the relaxation of the single term query and uniform frequency assumptions. The derivation of the performance evaluation formulas are provided as well as the analysis of various experimental settings. Results indicate that MMM outperforms the others as terms become more distinctive in their discriminatory power. MMM accomplishes the highest savings in retrieval eficiency for the high query weight case. We also discuss the applicability of the derivations to other partitioned signature organizations providing a detailed analysis of Fixed Prefix Partitioning (FPP) as an example. Finally, an appro.ximate perfortnance evaluation formula that works for both FPP and LHSS is modijied to account for the multiterm case

    Digital Image Access & Retrieval

    Get PDF
    The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
    corecore