32 research outputs found
Measuring inter-indexer consistency using a thesaurus
When professional indexers independently assign terms to a given document, the term sets generally differ between indexers. Studies of inter-indexer consistency measure the percentage of matching index terms, but none of them consider the semantic relationships that exist amongst these terms. We propose to represent multiple-indexers data in a vector space and use the cosine metric as a new consistency measure that can be extended by semantic relations between index terms. We believe that this new measure is more accurate and realistic than existing ones and therefore more suitable for evaluation of automatically extracted index terms
Thesaurus based automatic keyphrase indexing
We propose a new method that enhances automatic keyphrase extraction by using semantic information on terms and phrases gleaned from a domain-specific thesaurus. We evaluate the results against keyphrase sets assigned by a state-of-the-art keyphrase extraction system and those assigned by six professional indexers
An Intelligent Multi-Agent Recommender System for Human Capacity Building
This paper presents a Multi-Agent approach to the problem of recommending
training courses to engineering professionals. The recommendation system is
built as a proof of concept and limited to the electrical and mechanical
engineering disciplines. Through user modelling and data collection from a
survey, collaborative filtering recommendation is implemented using intelligent
agents. The agents work together in recommending meaningful training courses
and updating the course information. The system uses a users profile and
keywords from courses to rank courses. A ranking accuracy for courses of 90% is
achieved while flexibility is achieved using an agent that retrieves
information autonomously using data mining techniques from websites. This
manner of recommendation is scalable and adaptable. Further improvements can be
made using clustering and recording user feedback.Comment: Proceedings of the 14th IEEE Mediterranean Electrotechnical
Conference, 2008, pages 909 to 91
La indización de artículos científicos con el sistema de indización automática SISA comparada con la indizaicón en las Bases de datos Agricola, WoS y SCOPUS
Since some years the generation of digital documents is enormous as well as its massive incorporation to the information systems and both realities seem unstoppable. Likewise, there is no doubt that indexing is one of the fundamental processes executed in documentary units. Although the first investigations in automatic indexing began decades ago this subject continues to raise interest. Since then different proposals and methodologies have been presented. SISA is a multilingual automatic indexing system for scientific articles based on heuristic and statistical principles governed by rules based on these principles. Objective. In this described context of constant digital increase, it is sought to know the SISA capabilities in the automatic indexing of articles in relation to how they do it in the Agricola, WOS and SCOPUS databases. Material and method. One hundred articles published in different years by the journal Agronomy for sustainable development were randomly selected, the indexing assigned to the articles in the mentioned databases was located, the documents were indexed with SISA, the different indexing were compared and they were calculated the consistency between Agricola and SISA
Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts under Precision and Recall Constraints
Semantic annotations have to satisfy quality constraints to be useful for
digital libraries, which is particularly challenging on large and diverse
datasets. Confidence scores of multi-label classification methods typically
refer only to the relevance of particular subjects, disregarding indicators of
insufficient content representation at the document-level. Therefore, we
propose a novel approach that detects documents rather than concepts where
quality criteria are met. Our approach uses a deep, multi-layered regression
architecture, which comprises a variety of content-based indicators. We
evaluated multiple configurations using text collections from law and
economics, where the available content is restricted to very short texts.
Notably, we demonstrate that the proposed quality estimation technique can
determine subsets of the previously unseen data where considerable gains in
document-level recall can be achieved, while upholding precision at the same
time. Hence, the approach effectively performs a filtering that ensures high
data quality standards in operative information retrieval systems.Comment: authors' manuscript, paper submitted to TPDL-2018 conference, 12
page
Evaluation of controlled vocabularies by inter-indexer consistency
Introduction. Several controlled vocabularies are used for indexing three journal articles to check if with a list of descriptors are achieved better or equals of consistency rates that with a standard thesaurus and augmented thesaurus. Method. A set of terminology of Library and Information Science was used to build a list of descriptors with equivalence relations (USE and UF), a standard thesaurus and a augmented thesaurus (all the descriptors have scope notes). Subsequently, three articles were indexed by selected indexers who had varying degrees of experience – on the one hand Library and Information Science students and on the other, professionals from various documentation centres. Hooper’s measure to find the consistency between pairs of novice indexers and experts has been applied. Analysis. Data were tabulated and analysed systematically according pairs of novice indexers and experts has been applied. Results. The tool with the best results is the list of descriptors (39.5% consistency), followed by the augmented thesaurus (29.8%) and, with an almost identical value, the standard thesaurus (27.5%). Conclusion. It is concluded that the list of descriptors in both groups returns better indexing consistency but we need more research
Automaattisen sisällönkuvailun ohjelmiston rakentaminen – case Annif
Sisällönkuvailun automatisointiratkaisut ovat puhuttaneet kirjastomaailmassa viime vuosina, ja erilaisia kokeiluja on tehty niin Suomessa kuin maailmallakin. Kansalliskirjastossa kehitetty automaattisen sisällönkuvailun Annif-työkalu on herättänyt paljon mielenkiintoa monissa organisaatioissa ja kokemukset ensimmäisistä käyttöönotoista ovat olleet lupaavia. Mitä kehitysvalintoja Annifia rakennettaessa on tehty, ja minkälaisia haasteita kuvailun automatisointiin ylipäätään liittyy
Inter-indexing consistency in subject heading of electronic materials in Croatian public libraries’ WebPAC-s
Smatra se da korisniku dosljednost kod predmetnog označivanja osigurava veću mogućnost pronalaska željenog dokumenta. I dok je dosljednost označivanja u zapadnim zemljama dobro istražena i analizirana tema, u području hrvatskog knjižničarstva relativno je nova. Ovaj rad donosi rezultate istraživanja dosljednosti predmetnog označivanja elektroničke građe (ili analogne istog sadržaja) narodnih knjižnica okupljenih u tri velika skupna kataloga, ovisno o tome koji knjižnični softver koriste (CROLIST, MetelWIN, ZaKi). U radu se dosljednost analizira uz pomoć Hooperove i Rollingove formule na uzorku od 44 bibliografska zapisa zajednička knjižnicama u sva tri knjižnična sustava. Istraživanje je pokazalo da je dosljednost predmetnog označivanja djela u uzorku niskih 8,72 posto (Hooper) odnosno 11,20 posto (Rolling).It is assumed that indexing consistency will greatly increase users’ chances of finding a required document. Indexing consistency is a well researched and analyzed concept in Western countries, but in the field of Library and information science in Croatian it is relatively new. This paper presents findings of a research of the public library indexing consistency of electronic resources (or their analogue counterparts) conducted in three Croatian union catalogs (CROLIST, MetelWIN, and ZaKi). Two methods were used to calculate inter-indexer consistency: one posited by Hooper (1965), and the other by Rolling (1981). Inter-indexing consistency was calculated for a sample of 44 bibliographic records contained in all three catalogues. The research has shown that the average consistency was extremely low: 8.72% using Hooper’s method, and 11.20% using Rolling’s
Recommended from our members
A framework for evaluating automatic indexing or classification in the context of retrieval
Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. While some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The paper reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single “gold standard” method when evaluating indexing and retrieval and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on indexing, classification and approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard; evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance
La indización de artículos científicos con el sistema de indización automática SISA comparada con la indizaicón en las Bases de datos Agricola, WoS y SCOPUS
Since some years the generation of digital documents is enormous as well as its massive incorporation to the information systems and both realities seem unstoppable. Likewise, there is no doubt that indexing is one of the fundamental processes executed in documentary units. Although the first investigations in automatic indexing began decades ago this subject continues to raise interest. Since then different proposals and methodologies have been presented. SISA is a multilingual automatic indexing system for scientific articles based on heuristic and statistical principles governed by rules based on these principles. Objective. In this described context of constant digital increase, it is sought to know the SISA capabilities in the automatic indexing of articles in relation to how they do it in the Agricola, WOS and SCOPUS databases. Material and method. One hundred articles published in different years by the journal Agronomy for sustainable development were randomly selected, the indexing assigned to the articles in the mentioned databases was located, the documents were indexed with SISA, the different indexing were compared and they were calculated the consistency between Agricola and SISA