Search CORE

2,599 research outputs found

Recommended from our members

Comparing taxonomies for organising collections of documents

Author: Agirre Eneko
Clough Paul
Fernando Samuel
Hall Mark
Soroa Aitor
Stevenson Mark
Publication venue: The COLING 2012 Organizing Committee
Publication date: 01/01/2012
Field of study

There is a demand for taxonomies to organise large collections of documents into categories for browsing and exploration. This paper examines four existing taxonomies that have been manually created, along with two methods for deriving taxonomies automatically from data items. We use these taxonomies to organise items from a large online cultural heritage collection. We then present two human evaluations of the taxonomies. The first measures the cohesion of the taxonomies to determine how well they group together similar items under the same concept node. The second analyses the concept relations in the taxonomies. The results show that the manual taxonomies have high quality well defined relations. However the novel automatic method is found to generate very high cohesion

Open Research Online (The Open University)

Edge Hill University Research Information Repository

Recommended from our members

Evaluating hierarchical organisation structures for exploring digital libraries

Author: Agirre Eneko
Clough Paul D.
Fernando Samuel
Hall Mark M.
Soroa Aitor
Stevenson Mark
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/07/2014
Field of study

Search boxes providing simple keyword-based search are insufficient when users have complex information needs or are unfamiliar with a collection, for example in large digital libraries. Browsing hierarchies can support these richer interactions, but many collections do not have a suitable hierarchy available. In this paper we present a number of approaches for automatically creating hierarchies and mapping items into them, including a novel technique which automatically adapts a Wikipedia-based taxonomy to the target collection. These approaches are applied to a large collection of cultural heritage items which is formed through the aggregation of other collections and for which no unified hierarchy is available. We investigate a number of novel user-evaluated metrics to quantify the hierarchies’ quality and performance, showing that the proposed technique is preferred by users. From this we draw a number of conclusions as to what makes a hierarchy useful to the user

Open Research Online (The Open University)

Edge Hill University Research Information Repository

White Rose Research Online

PATHS: A System for Accessing Cultural Heritage Collections

Author: Aitor Soroa
Basque Country
Eneko Agirre
Manuel Lardizabal
Mark Hall
Mark Stevenson
Nikolaos Aletras
Paul Clough
Paula Goodale
Samuel Fern
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2013
Field of study

This paper describes a system for navigating large collections of information about cultural heritage which is applied to Europeana, the European Library. Europeana contains over 20 million artefacts with meta-data in a wide range of European languages. The system currently provides access to Europeana content with meta-data in English and Spanish. The paper describes how Natural Language Processing is used to enrich and organise this meta-data to assist navigation through Europeana and shows how this information is used within the system

CiteSeerX

Open Research Online (The Open University)

Edge Hill University Research Information Repository

A Study on the Use of Ontologies to Represent Collective Knowledge

Author: McAuley John
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2008
Field of study

The development of ontologies has become an area of considerable research interest over the past number of years. Domain ontologies are often developed to represent a shared understanding that in turn indicates cooperative effort by a user community. However, the structure and form that an ontology takes is predicated both on the approach of the developer and the cooperation of the user community. A shift has taken place in recent years from the use of highly specialised and expressive ontologies to simpler knowledge models, progressively developed by community contribution. It is within this context that this thesis investigates the use of ontologies as a means to representing collective knowledge. It investigates the impact of the community on the approach to and outcome of knowledge representation and compares the use of simple terminological ontologies with highly structured expressive ontologies in community-based narrative environments

Arrow@TUDublin

Recommended from our members

Linking Textual Resources to Support Information Discovery

Author: Knoth Petr
Publication venue
Publication date: 14/05/2015
Field of study

A vast amount of information is today stored in the form of textual documents, many of which are available online. These documents come from different sources and are of different types. They include newspaper articles, books, corporate reports, encyclopedia entries and research papers. At a semantic level, these documents contain knowledge, which was created by explicitly connecting information and expressing it in the form of a natural language. However, a significant amount of knowledge is not explicitly stated in a single document, yet can be derived or discovered by researching, i.e. accessing, comparing, contrasting and analysing, information from multiple documents. Carrying out this work using traditional search interfaces is tedious due to information overload and the difficulty of formulating queries that would help us to discover information we are not aware of. In order to support this exploratory process, we need to be able to effectively navigate between related pieces of information across documents. While information can be connected using manually curated cross-document links, this approach not only does not scale, but cannot systematically assist us in the discovery of sometimes non-obvious (hidden) relationships. Consequently, there is a need for automatic approaches to link discovery. This work studies how people link content, investigates the properties of different link types, presents new methods for automatic link discovery and designs a system in which link discovery is applied on a collection of millions of documents to improve access to public knowledge

Open Research Online (The Open University)

Evaluation Methodologies for Visual Information Retrieval and Annotation

Author: Nowak Stefanie
Publication venue
Publication date: 09/03/2012
Field of study

Die automatisierte Evaluation von Informations-Retrieval-Systemen erlaubt Performanz und Qualität der Informationsgewinnung zu bewerten. Bereits in den 60er Jahren wurden erste Methodologien für die system-basierte Evaluation aufgestellt und in den Cranfield Experimenten überprüft. Heutzutage gehören Evaluation, Test und Qualitätsbewertung zu einem aktiven Forschungsfeld mit erfolgreichen Evaluationskampagnen und etablierten Methoden. Evaluationsmethoden fanden zunächst in der Bewertung von Textanalyse-Systemen Anwendung. Mit dem rasanten Voranschreiten der Digitalisierung wurden diese Methoden sukzessive auf die Evaluation von Multimediaanalyse-Systeme übertragen. Dies geschah häufig, ohne die Evaluationsmethoden in Frage zu stellen oder sie an die veränderten Gegebenheiten der Multimediaanalyse anzupassen. Diese Arbeit beschäftigt sich mit der system-basierten Evaluation von Indizierungssystemen für Bildkollektionen. Sie adressiert drei Problemstellungen der Evaluation von Annotationen: Nutzeranforderungen für das Suchen und Verschlagworten von Bildern, Evaluationsmaße für die Qualitätsbewertung von Indizierungssystemen und Anforderungen an die Erstellung visueller Testkollektionen. Am Beispiel der Evaluation automatisierter Photo-Annotationsverfahren werden relevante Konzepte mit Bezug zu Nutzeranforderungen diskutiert, Möglichkeiten zur Erstellung einer zuverlässigen Ground Truth bei geringem Kosten- und Zeitaufwand vorgestellt und Evaluationsmaße zur Qualitätsbewertung eingeführt, analysiert und experimentell verglichen. Traditionelle Maße zur Ermittlung der Performanz werden in vier Dimensionen klassifiziert. Evaluationsmaße vergeben üblicherweise binäre Kosten für korrekte und falsche Annotationen. Diese Annahme steht im Widerspruch zu der Natur von Bildkonzepten. Das gemeinsame Auftreten von Bildkonzepten bestimmt ihren semantischen Zusammenhang und von daher sollten diese auch im Zusammenhang auf ihre Richtigkeit hin überprüft werden. In dieser Arbeit wird aufgezeigt, wie semantische Ähnlichkeiten visueller Konzepte automatisiert abgeschätzt und in den Evaluationsprozess eingebracht werden können. Die Ergebnisse der Arbeit inkludieren ein Nutzermodell für die konzeptbasierte Suche von Bildern, eine vollständig bewertete Testkollektion und neue Evaluationsmaße für die anforderungsgerechte Qualitätsbeurteilung von Bildanalysesystemen.Performance assessment plays a major role in the research on Information Retrieval (IR) systems. Starting with the Cranfield experiments in the early 60ies, methodologies for the system-based performance assessment emerged and established themselves, resulting in an active research field with a number of successful benchmarking activities. With the rise of the digital age, procedures of text retrieval evaluation were often transferred to multimedia retrieval evaluation without questioning their direct applicability. This thesis investigates the problem of system-based performance assessment of annotation approaches in generic image collections. It addresses three important parts of annotation evaluation, namely user requirements for the retrieval of annotated visual media, performance measures for multi-label evaluation, and visual test collections. Using the example of multi-label image annotation evaluation, I discuss which concepts to employ for indexing, how to obtain a reliable ground truth to moderate costs, and which evaluation measures are appropriate. This is accompanied by a thorough analysis of related work on system-based performance assessment in Visual Information Retrieval (VIR). Traditional performance measures are classified into four dimensions and investigated according to their appropriateness for visual annotation evaluation. One of the main ideas in this thesis adheres to the common assumption on the binary nature of the score prediction dimension in annotation evaluation. However, the predicted concepts and the set of true indexed concepts interrelate with each other. This work will show how to utilise these semantic relationships for a fine-grained evaluation scenario. Outcomes of this thesis result in a user model for concept-based image retrieval, a fully assessed image annotation test collection, and a number of novel performance measures for image annotation evaluation

Fraunhofer-ePrints

Digitale Bibliothek Thüringen

An integrating text retrieval framework for Digital Ecosystems Paradigm

Author: Dreher Heinz
Zhu Dengya
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

The purpose of the research is to provide effective information retrieval services for digital ?organisms? in a digital ecosystem by leveraging the power of Web searching technology. A novel integrating digital ecosystem search framework (a new digital organism) is proposed which employs the Web search technology and traditional database searching techniques to provide economic organisms with comprehensive, dynamic, and organization-oriented information retrieval ranging from the Internet to personal (semantic) desktop

Crossref

espace@Curtin

Towards memory supporting personal information management tools

Author: Adar
Barreau
Bederson
Boardman
Bower
Brown
Bruce
Bruce
Byström
Capra
Capra
Carroll
Case
Clark
Cohen
Crovitz
Czerwinski
Czerwinski
Dey
Dourish
Ducheneaut
Dumais
Ebbinghaus
Eldridge
Elsweiler
Eysenck
Freeman
Gifford
Gwizdka
Hayes
Heesch
Herrmann
Herrmann
Hertzum
Hightower
Jones
Jones
Jones
Jones
Krishnan
Kwasnik
Kwasnik
Lansdale
Loftus
Loftus
Malone
Marshall
Mills
Neisser
Palen
Platt
Reason
Rekimoto
Renaud
Rieman
Rodden
Rodden
Rubin
Rubin
Rubinstein
Sachs
Spink
Sunderland
Teevan
Terry
Whittaker
Yang
Yee
Publication venue: 'Wiley'
Publication date: 01/01/2007
Field of study

In this article we discuss re-retrieving personal information objects and relate the task to recovering from lapse(s) in memory. We propose that fundamentally it is lapses in memory that impede users from successfully re-finding the information they need. Our hypothesis is that by learning more about memory lapses in non-computing contexts and how people cope and recover from these lapses, we can better inform the design of PIM tools and improve the user's ability to re-access and re-use objects. We describe a diary study that investigates the everyday memory problems of 25 people from a wide range of backgrounds. Based on the findings, we present a series of principles that we hypothesize will improve the design of personal information management tools. This hypothesis is validated by an evaluation of a tool for managing personal photographs, which was designed with respect to our findings. The evaluation suggests that users' performance when re-finding objects can be improved by building personal information management tools to support characteristics of human memory

University of Regensburg Publication Server

Crossref

University of Strathclyde Institutional Repository