    Measuring inter-indexer consistency using a thesaurus

    When professional indexers independently assign terms to a given document, the term sets generally differ between indexers. Studies of inter-indexer consistency measure the percentage of matching index terms, but none of them consider the semantic relationships that exist amongst these terms. We propose to represent multiple-indexers data in a vector space and use the cosine metric as a new consistency measure that can be extended by semantic relations between index terms. We believe that this new measure is more accurate and realistic than existing ones and therefore more suitable for evaluation of automatically extracted index terms

    Thesaurus based automatic keyphrase indexing

    We propose a new method that enhances automatic keyphrase extraction by using semantic information on terms and phrases gleaned from a domain-specific thesaurus. We evaluate the results against keyphrase sets assigned by a state-of-the-art keyphrase extraction system and those assigned by six professional indexers

    An Intelligent Multi-Agent Recommender System for Human Capacity Building

    This paper presents a Multi-Agent approach to the problem of recommending training courses to engineering professionals. The recommendation system is built as a proof of concept and limited to the electrical and mechanical engineering disciplines. Through user modelling and data collection from a survey, collaborative filtering recommendation is implemented using intelligent agents. The agents work together in recommending meaningful training courses and updating the course information. The system uses a users profile and keywords from courses to rank courses. A ranking accuracy for courses of 90% is achieved while flexibility is achieved using an agent that retrieves information autonomously using data mining techniques from websites. This manner of recommendation is scalable and adaptable. Further improvements can be made using clustering and recording user feedback.Comment: Proceedings of the 14th IEEE Mediterranean Electrotechnical Conference, 2008, pages 909 to 91

    La indización de artículos científicos con el sistema de indización automática SISA comparada con la indizaicón en las Bases de datos Agricola, WoS y SCOPUS

    Since some years the generation of digital documents is enormous as well as its massive incorporation to the information systems and both realities seem unstoppable. Likewise, there is no doubt that indexing is one of the fundamental processes executed in documentary units. Although the first investigations in automatic indexing began decades ago this subject continues to raise interest. Since then different proposals and methodologies have been presented. SISA is a multilingual automatic indexing system for scientific articles based on heuristic and statistical principles governed by rules based on these principles. Objective. In this described context of constant digital increase, it is sought to know the SISA capabilities in the automatic indexing of articles in relation to how they do it in the Agricola, WOS and SCOPUS databases. Material and method. One hundred articles published in different years by the journal Agronomy for sustainable development were randomly selected, the indexing assigned to the articles in the mentioned databases was located, the documents were indexed with SISA, the different indexing were compared and they were calculated the consistency between Agricola and SISA

    Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts under Precision and Recall Constraints

    Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the document-level. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.Comment: authors' manuscript, paper submitted to TPDL-2018 conference, 12 page

    Evaluation of controlled vocabularies by inter-indexer consistency

    Introduction. Several controlled vocabularies are used for indexing three journal articles to check if with a list of descriptors are achieved better or equals of consistency rates that with a standard thesaurus and augmented thesaurus. Method. A set of terminology of Library and Information Science was used to build a list of descriptors with equivalence relations (USE and UF), a standard thesaurus and a augmented thesaurus (all the descriptors have scope notes). Subsequently, three articles were indexed by selected indexers who had varying degrees of experience – on the one hand Library and Information Science students and on the other, professionals from various documentation centres. Hooper’s measure to find the consistency between pairs of novice indexers and experts has been applied. Analysis. Data were tabulated and analysed systematically according pairs of novice indexers and experts has been applied. Results. The tool with the best results is the list of descriptors (39.5% consistency), followed by the augmented thesaurus (29.8%) and, with an almost identical value, the standard thesaurus (27.5%). Conclusion. It is concluded that the list of descriptors in both groups returns better indexing consistency but we need more research

    Automaattisen sisällönkuvailun ohjelmiston rakentaminen – case Annif

    Sisällönkuvailun automatisointiratkaisut ovat puhuttaneet kirjastomaailmassa viime vuosina, ja erilaisia kokeiluja on tehty niin Suomessa kuin maailmallakin. Kansalliskirjastossa kehitetty automaattisen sisällönkuvailun Annif-työkalu on herättänyt paljon mielenkiintoa monissa organisaatioissa ja kokemukset ensimmäisistä käyttöönotoista ovat olleet lupaavia. Mitä kehitysvalintoja Annifia rakennettaessa on tehty, ja minkälaisia haasteita kuvailun automatisointiin ylipäätään liittyy

    Inter-indexing consistency in subject heading of electronic materials in Croatian public libraries’ WebPAC-s

    Smatra se da korisniku dosljednost kod predmetnog označivanja osigurava veću mogućnost pronalaska željenog dokumenta. I dok je dosljednost označivanja u zapadnim zemljama dobro istražena i analizirana tema, u području hrvatskog knjižničarstva relativno je nova. Ovaj rad donosi rezultate istraživanja dosljednosti predmetnog označivanja elektroničke građe (ili analogne istog sadržaja) narodnih knjižnica okupljenih u tri velika skupna kataloga, ovisno o tome koji knjižnični softver koriste (CROLIST, MetelWIN, ZaKi). U radu se dosljednost analizira uz pomoć Hooperove i Rollingove formule na uzorku od 44 bibliografska zapisa zajednička knjižnicama u sva tri knjižnična sustava. Istraživanje je pokazalo da je dosljednost predmetnog označivanja djela u uzorku niskih 8,72 posto (Hooper) odnosno 11,20 posto (Rolling).It is assumed that indexing consistency will greatly increase users’ chances of finding a required document. Indexing consistency is a well researched and analyzed concept in Western countries, but in the field of Library and information science in Croatian it is relatively new. This paper presents findings of a research of the public library indexing consistency of electronic resources (or their analogue counterparts) conducted in three Croatian union catalogs (CROLIST, MetelWIN, and ZaKi). Two methods were used to calculate inter-indexer consistency: one posited by Hooper (1965), and the other by Rolling (1981). Inter-indexing consistency was calculated for a sample of 44 bibliographic records contained in all three catalogues. The research has shown that the average consistency was extremely low: 8.72% using Hooper’s method, and 11.20% using Rolling’s

