Search CORE

32 research outputs found

Measuring inter-indexer consistency using a thesaurus

Author: Medelyan Olena
Witten Ian H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

When professional indexers independently assign terms to a given document, the term sets generally differ between indexers. Studies of inter-indexer consistency measure the percentage of matching index terms, but none of them consider the semantic relationships that exist amongst these terms. We propose to represent multiple-indexers data in a vector space and use the cosine metric as a new consistency measure that can be extended by semantic relations between index terms. We believe that this new measure is more accurate and realistic than existing ones and therefore more suitable for evaluation of automatically extracted index terms

Crossref

Research Commons@Waikato

Thesaurus based automatic keyphrase indexing

Author: Medelyan Olena
Witten Ian H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

We propose a new method that enhances automatic keyphrase extraction by using semantic information on terms and phrases gleaned from a domain-specific thesaurus. We evaluate the results against keyphrase sets assigned by a state-of-the-art keyphrase extraction system and those assigned by six professional indexers

Crossref

Research Commons@Waikato

An Intelligent Multi-Agent Recommender System for Human Capacity Building

Author: Marivate Vukosi N.
Marwala Tshilidzi
Ssali George
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/06/2008
Field of study

This paper presents a Multi-Agent approach to the problem of recommending training courses to engineering professionals. The recommendation system is built as a proof of concept and limited to the electrical and mechanical engineering disciplines. Through user modelling and data collection from a survey, collaborative filtering recommendation is implemented using intelligent agents. The agents work together in recommending meaningful training courses and updating the course information. The system uses a users profile and keywords from courses to rank courses. A ranking accuracy for courses of 90% is achieved while flexibility is achieved using an agent that retrieves information autonomously using data mining techniques from websites. This manner of recommendation is scalable and adaptable. Further improvements can be made using clustering and recording user feedback.Comment: Proceedings of the 14th IEEE Mediterranean Electrotechnical Conference, 2008, pages 909 to 91

arXiv.org e-Print Archive

Crossref

La indización de artículos científicos con el sistema de indización automática SISA comparada con la indizaicón en las Bases de datos Agricola, WoS y SCOPUS

Author: Gil-Leiva Isidoro
Publication venue
Publication date: 23/11/2017
Field of study

Since some years the generation of digital documents is enormous as well as its massive incorporation to the information systems and both realities seem unstoppable. Likewise, there is no doubt that indexing is one of the fundamental processes executed in documentary units. Although the first investigations in automatic indexing began decades ago this subject continues to raise interest. Since then different proposals and methodologies have been presented. SISA is a multilingual automatic indexing system for scientific articles based on heuristic and statistical principles governed by rules based on these principles. Objective. In this described context of constant digital increase, it is sought to know the SISA capabilities in the automatic indexing of articles in relation to how they do it in the Agricola, WOS and SCOPUS databases. Material and method. One hundred articles published in different years by the journal Agronomy for sustainable development were randomly selected, the indexing assigned to the articles in the mentioned databases was located, the documents were indexed with SISA, the different indexing were compared and they were calculated the consistency between Agricola and SISA

E-LIS

Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts under Precision and Recall Constraints

Author: D Trieschnigg
E Gibaja
E Loza Mencía
F Pedregosa
F Sebastiani
JH Friedman
LN Rolling
M Huang
N Tahmasebi
O Medelyan
P Geurts
Publication venue
Publication date: 01/01/2018
Field of study

Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the document-level. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.Comment: authors' manuscript, paper submitted to TPDL-2018 conference, 12 page

arXiv.org e-Print Archive

Crossref

University of Twente Research Information

Evaluation of controlled vocabularies by inter-indexer consistency

Author: Gil-Leiva Isidoro
Soler-Monreal Concha
Publication venue
Publication date: 01/01/2011
Field of study

Introduction. Several controlled vocabularies are used for indexing three journal articles to check if with a list of descriptors are achieved better or equals of consistency rates that with a standard thesaurus and augmented thesaurus. Method. A set of terminology of Library and Information Science was used to build a list of descriptors with equivalence relations (USE and UF), a standard thesaurus and a augmented thesaurus (all the descriptors have scope notes). Subsequently, three articles were indexed by selected indexers who had varying degrees of experience – on the one hand Library and Information Science students and on the other, professionals from various documentation centres. Hooper’s measure to find the consistency between pairs of novice indexers and experts has been applied. Analysis. Data were tabulated and analysed systematically according pairs of novice indexers and experts has been applied. Results. The tool with the best results is the list of descriptors (39.5% consistency), followed by the augmented thesaurus (29.8%) and, with an almost identical value, the standard thesaurus (27.5%). Conclusion. It is concluded that the list of descriptors in both groups returns better indexing consistency but we need more research

E-LIS

Automaattisen sisällönkuvailun ohjelmiston rakentaminen – case Annif

Author: Hulkkonen Juha
Inkinen Juho
Kallio Aleksi
Koskela Markus
Lappalainen Mikko
Lehtinen Mona
Sjöberg Mats
Suominen Osma
Yetukuri Laxmana
Publication venue: 'Signum'
Publication date: 17/01/2022
Field of study

Sisällönkuvailun automatisointiratkaisut ovat puhuttaneet kirjastomaailmassa viime vuosina, ja erilaisia kokeiluja on tehty niin Suomessa kuin maailmallakin. Kansalliskirjastossa kehitetty automaattisen sisällönkuvailun Annif-työkalu on herättänyt paljon mielenkiintoa monissa organisaatioissa ja kokemukset ensimmäisistä käyttöönotoista ovat olleet lupaavia. Mitä kehitysvalintoja Annifia rakennettaessa on tehty, ja minkälaisia haasteita kuvailun automatisointiin ylipäätään liittyy

Journal.fi

Inter-indexing consistency in subject heading of electronic materials in Croatian public libraries’ WebPAC-s

Author: Inge Majlinger Tanocki
Kornelija Petr Balog
Publication venue: 'Croatian Library Association'
Publication date: 01/01/2014
Field of study

Smatra se da korisniku dosljednost kod predmetnog označivanja osigurava veću mogućnost pronalaska željenog dokumenta. I dok je dosljednost označivanja u zapadnim zemljama dobro istražena i analizirana tema, u području hrvatskog knjižničarstva relativno je nova. Ovaj rad donosi rezultate istraživanja dosljednosti predmetnog označivanja elektroničke građe (ili analogne istog sadržaja) narodnih knjižnica okupljenih u tri velika skupna kataloga, ovisno o tome koji knjižnični softver koriste (CROLIST, MetelWIN, ZaKi). U radu se dosljednost analizira uz pomoć Hooperove i Rollingove formule na uzorku od 44 bibliografska zapisa zajednička knjižnicama u sva tri knjižnična sustava. Istraživanje je pokazalo da je dosljednost predmetnog označivanja djela u uzorku niskih 8,72 posto (Hooper) odnosno 11,20 posto (Rolling).It is assumed that indexing consistency will greatly increase users’ chances of finding a required document. Indexing consistency is a well researched and analyzed concept in Western countries, but in the field of Library and information science in Croatian it is relatively new. This paper presents findings of a research of the public library indexing consistency of electronic resources (or their analogue counterparts) conducted in three Croatian union catalogs (CROLIST, MetelWIN, and ZaKi). Two methods were used to calculate inter-indexer consistency: one posited by Hooper (1965), and the other by Rolling (1981). Inter-indexing consistency was calculated for a sample of 44 bibliographic records contained in all three catalogues. The research has shown that the average consistency was extremely low: 8.72% using Hooper’s method, and 11.20% using Rolling’s

Repository of the City and University Library Osijek

Repository of Josip Juraj Strossmayer University of Osijek

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Recommended from our members

A framework for evaluating automatic indexing or classification in the context of retrieval

Author: Anderson
Aronson
Bainbridge
Beaulieu
Belkin
Blandford
Borlund
Braschler
Brenner
Buckley
Buckley
Chung
Cleverdon
Colosimo
Cooper
Davis
Fidel
Golub
Golub
Golub
Hersh
Hliaoutakis
Hosseini
Hripcsak
Huang
Iivonen
Ingwersen
Ingwersen
Kazai
Kekäläinen
Kim
Lalmas
Lancaster
Lancaster
Lewis
Liu
Lykke
Mai
Markey
Medelyan
Mladenic
Moens
Oard
Olson
Paynter
Plaunt
Purpura
Ribeiro-Neto
Roberts
Roitblat
Rolling
Rosenberg
Ruiz
Saracevic
Saracevic
Saracevic
Sebastiani
Silvester
Soergel
Soergel
Sormunen
Sparck Jones
Suomela
Svarre
Tonkin
Tsai
Venanzi
Voorhees
Publication venue: 'Wiley'
Publication date: 22/10/2015
Field of study

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. While some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The paper reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single “gold standard” method when evaluating indexing and retrieval and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on indexing, classification and approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard; evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance

City Research Online

Crossref

University of South Wales Research Explorer

VBN

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Linnéuniversitetets forskningsdatabas

Explore Bristol Research