96,294 research outputs found

    Effect of Tunable Indexing on Term Distribution and Cluster-based Information Retrieval Performance

    Get PDF
    The purpose of this study is to investigate the effect of tunable indexing on the structure and information retrieval performance of a clustered document database. The generation of all cluster structures and calculation of term discrimination values is based upon the Cover Coefficient-Based Clustering Methodology. Information retrieval performance is measured in terms of precision, recall, and e-measure. The relationship between term generality and term discrimination value is quantified using the Pearson Rank Correlation Coefficient Test. The effect of tunable indexing on index term distribution and on the number of target clusters is examined

    Il Nuovo soggettario in Aleph 500

    Get PDF
    This document illustrates a project for the adaptation of the ALEPH 500 software to the characteristics of the subject indexing language of "Nuovo soggettario", in respect of the international formats Unimarc/Authorities and Unimarc/Bibliographic. The base unit of the indexing language of "Nuovo soggettario", structured according to the method of GRIS (Gruppo di ricerca sull'indicizzazione per soggetto, Research Group on Subject Indexing), is the indexing term, either representing a concept or referring to an individually-named entity, in the form of a noun or noun phrase. The indexing terms are not coincident to the elements of the traditional subject headings, the so-called "descrittori" in SBN (the Italian National library service)

    Quantifying literature citations, index terms, and Gene Ontology annotations in the Saccharomyces Genome Database to assess results-set clustering utility

    Get PDF
    A set of 37,325 unique literature citations was identified from 120,078 literature-based annotations in the Saccharomyces Genome Database (SGD). The citations, gene products, and related Gene Ontology (GO) annotations were analyzed to quantify unique articles, journals, genes, and to rank by publication year, language, and GO term frequency. GO terms, MeSH indexing terms, MeSH Journal Descriptors, and SGD Literature Topics were quantified and analyzed to assess their potential utility for results set clustering. Results: Bradford’s Law of Scattering was shown to hold for the citations, journals, gene products, and GO annotations. Only the MeSH terms and article title/abstract pairs had significant numbers of term co-occurrence. Multiple term types may be useful for faceted searching and clustered results set browsing if the strengths of each are leveraged
    • …
    corecore