Search CORE

157 research outputs found

Theoretical evaluation of XML retrieval

Author: Tobias Blanke
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Representing Aboutness: Automatically Indexing 19th- Century Encyclopedia Britannica Entries

Author: Boone Jane
Grabus Sam
Greenberg Jane
Logan Peter
Publication venue: University of Washington Information School
Publication date: 23/09/2019
Field of study

Representing aboutness is a challenge for humanities documents, given the linguistic indeterminacy of the text. The challenge is even greater when applying automatic indexing to historical documents for a multidisciplinary collection, such as encyclopedias. The research presented in this paper explores this challenge with an automatic indexing comparative study examining topic relevance. The setting is the NEH-funded 19th-Century Knowledge Project, where researchers in the Digital Scholarship Center, Temple University, and the Metadata Research Center, Drexel University, are investigating the best way to index entries across four historical editions of the Encyclopedia Britannica (3rd, 7th, 9th, and 11th editions). Individual encyclopedia entry entries were processed using the Helping Interdisciplinary Vocabulary Engineering (HIVE) system, a linked-data, automatic indexing terminology application that uses controlled vocabularies. Comparative topic relevance evaluation was performed for three separate keyword extraction algorithms: RAKE, Maui, and Kea++. Results show that RAKE performed the best, with an average of 67% precision for RAKE, and 28% precision for both Maui and Kea++. Additionally, the highest-ranked HIVE results with both RAKE and Kea++ demonstrated relevance across all sample entries, while Maui’s highest-ranked results returned zero relevant terms. This paper reports on background information, research objectives and methods, results, and future research prospects for further optimization of RAKE’s algorithm parameters to accommodate for encyclopedia entries of different lengths, and evaluating the indexing impact of correcting the historical Long S

University of Washington: ResearchWorks Journal Hosting

Recommended from our members

A framework for evaluating automatic indexing or classification in the context of retrieval

Author: Anderson
Aronson
Bainbridge
Beaulieu
Belkin
Blandford
Borlund
Braschler
Brenner
Buckley
Buckley
Chung
Cleverdon
Colosimo
Cooper
Davis
Fidel
Golub
Golub
Golub
Hersh
Hliaoutakis
Hosseini
Hripcsak
Huang
Iivonen
Ingwersen
Ingwersen
Kazai
Kekäläinen
Kim
Lalmas
Lancaster
Lancaster
Lewis
Liu
Lykke
Mai
Markey
Medelyan
Mladenic
Moens
Oard
Olson
Paynter
Plaunt
Purpura
Ribeiro-Neto
Roberts
Roitblat
Rolling
Rosenberg
Ruiz
Saracevic
Saracevic
Saracevic
Sebastiani
Silvester
Soergel
Soergel
Sormunen
Sparck Jones
Suomela
Svarre
Tonkin
Tsai
Venanzi
Voorhees
Publication venue: 'Wiley'
Publication date: 22/10/2015
Field of study

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. While some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The paper reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single “gold standard” method when evaluating indexing and retrieval and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on indexing, classification and approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard; evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance

City Research Online

Crossref

University of South Wales Research Explorer

VBN

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Linnéuniversitetets forskningsdatabas

Explore Bristol Research

Using the Web Infrastructure for Real Time Recovery of Missing Web Pages

Author: Klein Martin
Publication venue: ODU Digital Commons
Publication date: 01/07/2011
Field of study

Given the dynamic nature of the World Wide Web, missing web pages, or 404 Page not Found responses, are part of our web browsing experience. It is our intuition that information on the web is rarely completely lost, it is just missing. In whole or in part, content often moves from one URI to another and hence it just needs to be (re-)discovered. We evaluate several methods for a \justin- time approach to web page preservation. We investigate the suitability of lexical signatures and web page titles to rediscover missing content. It is understood that web pages change over time which implies that the performance of these two methods depends on the age of the content. We therefore conduct a temporal study of the decay of lexical signatures and titles and estimate their half-life. We further propose the use of tags that users have created to annotate pages as well as the most salient terms derived from a page\u27s link neighborhood. We utilize the Memento framework to discover previous versions of web pages and to execute the above methods. We provide a work ow including a set of parameters that is most promising for the (re-)discovery of missing web pages. We introduce Synchronicity, a web browser add-on that implements this work ow. It works while the user is browsing and detects the occurrence of 404 errors automatically. When activated by the user Synchronicity offers a total of six methods to either rediscover the missing page at its new URI or discover an alternative page that satisfies the user\u27s information need. Synchronicity depends on user interaction which enables it to provide results in real time

Old Dominion University

Faceted Thesauri

Author: Binding Ceri
Tudhope Douglas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2008
Field of study

University of South Wales Research Explorer

Crowdsourcing for image metadata : a comparison between game-generated tags and professional descriptors

Author: Thøgersen Rasmus
Publication venue: Universitetet i Parma
Publication date: 01/01/2012
Field of study

One way to address the challenge of creating metadata for digitized image collections is to rely on user-created index terms, typically by harvesting tags from the collaborative information services known as folksonomies or by allowing the users to tag directly in the catalog. An alternative method, only recently applied in cultural heritage institutions, is Human Computation Games, a crowdsourcing tool that relies on user-agreement to create valid tags. This study contributes to the research by investigating tags (at various degrees of validation) generated by a Human Computation Game and comparing them to descriptors assigned to the same images by professional indexers. The analysis is done by classifying tags and descriptors by term-category, as well as by measuring overlap on both syntactic (matching on terms) and semantic (matching on meaning) level between the tags and the descriptors. The findings shows that validated tags tend to describe ‘artifacts/objects’ and that game-generated tags typically will represent what is in the picture, rather than what it is about. Descriptors also primarily belonged to this term-category but also had a substantial amount of ‘Proper nouns’, mainly named locations. Tags generated by the game, not validated by player-agreement, had a higher frequency of ‘subjective/narrative’ tags, but also more errors. It was determined that the exact (character-for-character) overlap i.e. the number of common terms compared to the entire pool of tags and descriptors was slightly less than 5% for all types of tags. By extending the analysis to include fuzzy (word-stem) matching, the overlap more than doubled. The semantic overlap was established with thesaurus relations between a sample of tags and descriptors and adapting this - more inclusive - view of overlap resulted in an increase in percentage of tags that were matched to descriptors. More than half of the validated tags had some thesaurus relation to a descriptor added by a professional indexer. Approximately 60% of the thesaurus relations between descriptors and valid tags were either ‘same’ or ‘equivalent’ and roughly 20% were associative and 20% were hierarchical. For the hierarchical relations it was found that tags typically describe images at a less specific level than descriptors.Joint Master Degree in Digital Library Learning (DILL

NORA - Norwegian Open Research Archives

Open Digital Archive at Oslo and Akershus University College

Ontologies across disciplines

Author: Nickles Matthias
Pease Adam
Schalley Andrea C.
Schalley Andrea C.
Zaefferer Dietmar
Zaefferer Dietmar
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2007
Field of study

Open Access LMU

Tags and Folksonomies as Artifacts of Meaning.

Author: Monnin Alexandre
Publication venue: Cambridge Scholar Publishing
Publication date: 01/05/2010
Field of study

International audienceThe advent of the so-called Web 2.0 and the Semantic Web, for all their seemingly apparent (Floridi, 2009) differences, was instrumental in a renewed interest in questions that used to be addressed solely by the philosophy of language. Amongst these, the problem of meaning is paramount to many a Webservice. The fact that philosophical problematiques of such a magnitude as this one are brought up both in the design process and effective use of technical devices, forces us to realize that some topics are already moving from the traditional realm of theoretical questions to an area packed with innumerable “artefacts of meaning” - tagging systems among many (we will focus on these for the time being). This shift, that we’ve chosen to dub “artefactualization” elsewhere (Monnin, 2009a), bespeaks the now common coupling between philosophical questions and technological realities. In other words, a process that exhibits continuity as well as radical change and deeply subverts established views about the relationships symbolic forms and artefacts entertain with one another

HAL-UNICE

INRIA a CCSD electronic archive server

HAL-Paris1

Towards a Taxonomy of KOS:Dimensions for Classifying Knowledge Organization Systems

Author: Almeida Mauricio Barcellos
Souza Renato Rocha
Tudhope Douglas
Publication venue
Publication date: 01/01/2012
Field of study

University of South Wales Research Explorer