2,599 research outputs found
Recommended from our members
Comparing taxonomies for organising collections of documents
There is a demand for taxonomies to organise large collections of documents into categories for browsing and exploration. This paper examines four existing taxonomies that have been manually created, along with two methods for deriving taxonomies automatically from data items. We use these taxonomies to organise items from a large online cultural heritage collection. We then present two human evaluations of the taxonomies. The first measures the cohesion of the taxonomies to determine how well they group together similar items under the same concept node. The second analyses the concept relations in the taxonomies. The results show that the manual taxonomies have high quality well defined relations. However the novel automatic method is found to generate very high cohesion
Recommended from our members
Evaluating hierarchical organisation structures for exploring digital libraries
Search boxes providing simple keyword-based search are insufficient when users have complex information needs or are unfamiliar with a collection, for example in large digital libraries. Browsing hierarchies can support these richer interactions, but many collections do not have a suitable hierarchy available. In this paper we present a number of approaches for automatically creating hierarchies and mapping items into them, including a novel technique which automatically adapts a Wikipedia-based taxonomy to the target collection. These approaches are applied to a large collection of cultural heritage items which is formed through the aggregation of other collections and for which no unified hierarchy is available. We investigate a number of novel user-evaluated metrics to quantify the hierarchiesâ quality and performance, showing that the proposed technique is preferred by users. From this we draw a number of conclusions as to what makes a hierarchy useful to the user
PATHS: A System for Accessing Cultural Heritage Collections
This paper describes a system for navigating large collections of information about cultural heritage which is applied to Europeana, the European Library. Europeana contains over 20 million artefacts with meta-data in a wide range of European languages. The system currently provides access to Europeana content with meta-data in English and Spanish. The paper describes how Natural Language Processing is used to enrich and organise this meta-data to assist navigation through Europeana and shows how this information is used within the system
A Study on the Use of Ontologies to Represent Collective Knowledge
The development of ontologies has become an area of considerable research interest over the past number of years. Domain ontologies are often developed to represent a shared understanding that in turn indicates cooperative effort by a user community. However, the structure and form that an ontology takes is predicated both on the approach of the developer and the cooperation of the user community. A shift has taken place in recent years from the use of highly specialised and expressive ontologies to simpler knowledge models, progressively developed by community contribution. It is within this context that this thesis investigates the use of ontologies as a means to representing collective knowledge. It investigates the impact of the community on the approach to and outcome of knowledge representation and compares the use of simple terminological ontologies with highly structured expressive ontologies in community-based narrative environments
Recommended from our members
Linking Textual Resources to Support Information Discovery
A vast amount of information is today stored in the form of textual documents, many of which are available online. These documents come from different sources and are of different types. They include newspaper articles, books, corporate reports, encyclopedia entries and research papers. At a semantic level, these documents contain knowledge, which was created by explicitly connecting information and expressing it in the form of a natural language. However, a significant amount of knowledge is not explicitly stated in a single document, yet can be derived or discovered by researching, i.e. accessing, comparing, contrasting and analysing, information from multiple documents. Carrying out this work using traditional search interfaces is tedious due to information overload and the difficulty of formulating queries that would help us to discover information we are not aware of.
In order to support this exploratory process, we need to be able to effectively navigate between related pieces of information across documents. While information can be connected using manually curated cross-document links, this approach not only does not scale, but cannot systematically assist us in the discovery of sometimes non-obvious (hidden) relationships. Consequently, there is a need for automatic approaches to link discovery.
This work studies how people link content, investigates the properties of different link types, presents new methods for automatic link discovery and designs a system in which link discovery is applied on a collection of millions of documents to improve access to public knowledge
Evaluation Methodologies for Visual Information Retrieval and Annotation
Die automatisierte Evaluation von Informations-Retrieval-Systemen erlaubt
Performanz und QualitÀt der Informationsgewinnung zu bewerten. Bereits in
den 60er Jahren wurden erste Methodologien fĂŒr die system-basierte
Evaluation aufgestellt und in den Cranfield Experimenten ĂŒberprĂŒft.
Heutzutage gehören Evaluation, Test und QualitÀtsbewertung zu einem aktiven
Forschungsfeld mit erfolgreichen Evaluationskampagnen und etablierten
Methoden. Evaluationsmethoden fanden zunÀchst in der Bewertung von
Textanalyse-Systemen Anwendung. Mit dem rasanten Voranschreiten der
Digitalisierung wurden diese Methoden sukzessive auf die Evaluation von
Multimediaanalyse-Systeme ĂŒbertragen. Dies geschah hĂ€ufig, ohne die
Evaluationsmethoden in Frage zu stellen oder sie an die verÀnderten
Gegebenheiten der Multimediaanalyse anzupassen. Diese Arbeit beschÀftigt
sich mit der system-basierten Evaluation von Indizierungssystemen fĂŒr
Bildkollektionen. Sie adressiert drei Problemstellungen der Evaluation von
Annotationen: Nutzeranforderungen fĂŒr das Suchen und Verschlagworten von
Bildern, EvaluationsmaĂe fĂŒr die QualitĂ€tsbewertung von
Indizierungssystemen und Anforderungen an die Erstellung visueller
Testkollektionen. Am Beispiel der Evaluation automatisierter
Photo-Annotationsverfahren werden relevante Konzepte mit Bezug zu
Nutzeranforderungen diskutiert, Möglichkeiten zur Erstellung einer
zuverlÀssigen Ground Truth bei geringem Kosten- und Zeitaufwand vorgestellt
und EvaluationsmaĂe zur QualitĂ€tsbewertung eingefĂŒhrt, analysiert und
experimentell verglichen. Traditionelle MaĂe zur Ermittlung der Performanz
werden in vier Dimensionen klassifiziert. EvaluationsmaĂe vergeben
ĂŒblicherweise binĂ€re Kosten fĂŒr korrekte und falsche Annotationen. Diese
Annahme steht im Widerspruch zu der Natur von Bildkonzepten. Das gemeinsame
Auftreten von Bildkonzepten bestimmt ihren semantischen Zusammenhang und
von daher sollten diese auch im Zusammenhang auf ihre Richtigkeit hin
ĂŒberprĂŒft werden. In dieser Arbeit wird aufgezeigt, wie semantische
Ăhnlichkeiten visueller Konzepte automatisiert abgeschĂ€tzt und in den
Evaluationsprozess eingebracht werden können. Die Ergebnisse der Arbeit
inkludieren ein Nutzermodell fĂŒr die konzeptbasierte Suche von Bildern,
eine vollstĂ€ndig bewertete Testkollektion und neue EvaluationsmaĂe fĂŒr die
anforderungsgerechte QualitÀtsbeurteilung von Bildanalysesystemen.Performance assessment plays a major role in the research on Information
Retrieval (IR) systems. Starting with the Cranfield experiments in the
early 60ies, methodologies for the system-based performance assessment
emerged and established themselves, resulting in an active research field
with a number of successful benchmarking activities. With the rise of the
digital age, procedures of text retrieval evaluation were often transferred
to multimedia retrieval evaluation without questioning their direct
applicability. This thesis investigates the problem of system-based
performance assessment of annotation approaches in generic image
collections. It addresses three important parts of annotation evaluation,
namely user requirements for the retrieval of annotated visual media,
performance measures for multi-label evaluation, and visual test
collections. Using the example of multi-label image annotation evaluation,
I discuss which concepts to employ for indexing, how to obtain a reliable
ground truth to moderate costs, and which evaluation measures are
appropriate. This is accompanied by a thorough analysis of related work on
system-based performance assessment in Visual Information Retrieval (VIR).
Traditional performance measures are classified into four dimensions and
investigated according to their appropriateness for visual annotation
evaluation. One of the main ideas in this thesis adheres to the common
assumption on the binary nature of the score prediction dimension in
annotation evaluation. However, the predicted concepts and the set of true
indexed concepts interrelate with each other. This work will show how to
utilise these semantic relationships for a fine-grained evaluation
scenario. Outcomes of this thesis result in a user model for concept-based
image retrieval, a fully assessed image annotation test collection, and a
number of novel performance measures for image annotation evaluation
An integrating text retrieval framework for Digital Ecosystems Paradigm
The purpose of the research is to provide effective information retrieval services for digital ?organisms? in a digital ecosystem by leveraging the power of Web searching technology. A novel integrating digital ecosystem search framework (a new digital organism) is proposed which employs the Web search technology and traditional database searching techniques to provide economic organisms with comprehensive, dynamic, and organization-oriented information retrieval ranging from the Internet to personal (semantic) desktop
Towards memory supporting personal information management tools
In this article we discuss re-retrieving personal information objects and relate the task to recovering from lapse(s) in memory. We propose that fundamentally it is lapses in memory that impede users from successfully re-finding the information they need. Our hypothesis is that by learning more about memory lapses in non-computing contexts and how people cope and recover from these lapses, we can better inform the design of PIM tools and improve the user's ability to re-access and re-use objects. We describe a diary study that investigates the everyday memory problems of 25 people from a wide range of backgrounds. Based on the findings, we present a series of principles that we hypothesize will improve the design of personal information management tools. This hypothesis is validated by an evaluation of a tool for managing personal photographs, which was designed with respect to our findings. The evaluation suggests that users' performance when re-finding objects can be improved by building personal information management tools to support characteristics of human memory
Evaluation Methodologies for Visual Information Retrieval and Annotation
Die automatisierte Evaluation von Informations-Retrieval-Systemen erlaubt
Performanz und QualitÀt der Informationsgewinnung zu bewerten. Bereits in
den 60er Jahren wurden erste Methodologien fĂŒr die system-basierte
Evaluation aufgestellt und in den Cranfield Experimenten ĂŒberprĂŒft.
Heutzutage gehören Evaluation, Test und QualitÀtsbewertung zu einem aktiven
Forschungsfeld mit erfolgreichen Evaluationskampagnen und etablierten
Methoden. Evaluationsmethoden fanden zunÀchst in der Bewertung von
Textanalyse-Systemen Anwendung. Mit dem rasanten Voranschreiten der
Digitalisierung wurden diese Methoden sukzessive auf die Evaluation von
Multimediaanalyse-Systeme ĂŒbertragen. Dies geschah hĂ€ufig, ohne die
Evaluationsmethoden in Frage zu stellen oder sie an die verÀnderten
Gegebenheiten der Multimediaanalyse anzupassen. Diese Arbeit beschÀftigt
sich mit der system-basierten Evaluation von Indizierungssystemen fĂŒr
Bildkollektionen. Sie adressiert drei Problemstellungen der Evaluation von
Annotationen: Nutzeranforderungen fĂŒr das Suchen und Verschlagworten von
Bildern, EvaluationsmaĂe fĂŒr die QualitĂ€tsbewertung von
Indizierungssystemen und Anforderungen an die Erstellung visueller
Testkollektionen. Am Beispiel der Evaluation automatisierter
Photo-Annotationsverfahren werden relevante Konzepte mit Bezug zu
Nutzeranforderungen diskutiert, Möglichkeiten zur Erstellung einer
zuverlÀssigen Ground Truth bei geringem Kosten- und Zeitaufwand vorgestellt
und EvaluationsmaĂe zur QualitĂ€tsbewertung eingefĂŒhrt, analysiert und
experimentell verglichen. Traditionelle MaĂe zur Ermittlung der Performanz
werden in vier Dimensionen klassifiziert. EvaluationsmaĂe vergeben
ĂŒblicherweise binĂ€re Kosten fĂŒr korrekte und falsche Annotationen. Diese
Annahme steht im Widerspruch zu der Natur von Bildkonzepten. Das gemeinsame
Auftreten von Bildkonzepten bestimmt ihren semantischen Zusammenhang und
von daher sollten diese auch im Zusammenhang auf ihre Richtigkeit hin
ĂŒberprĂŒft werden. In dieser Arbeit wird aufgezeigt, wie semantische
Ăhnlichkeiten visueller Konzepte automatisiert abgeschĂ€tzt und in den
Evaluationsprozess eingebracht werden können. Die Ergebnisse der Arbeit
inkludieren ein Nutzermodell fĂŒr die konzeptbasierte Suche von Bildern,
eine vollstĂ€ndig bewertete Testkollektion und neue EvaluationsmaĂe fĂŒr die
anforderungsgerechte QualitÀtsbeurteilung von Bildanalysesystemen.Performance assessment plays a major role in the research on Information
Retrieval (IR) systems. Starting with the Cranfield experiments in the
early 60ies, methodologies for the system-based performance assessment
emerged and established themselves, resulting in an active research field
with a number of successful benchmarking activities. With the rise of the
digital age, procedures of text retrieval evaluation were often transferred
to multimedia retrieval evaluation without questioning their direct
applicability. This thesis investigates the problem of system-based
performance assessment of annotation approaches in generic image
collections. It addresses three important parts of annotation evaluation,
namely user requirements for the retrieval of annotated visual media,
performance measures for multi-label evaluation, and visual test
collections. Using the example of multi-label image annotation evaluation,
I discuss which concepts to employ for indexing, how to obtain a reliable
ground truth to moderate costs, and which evaluation measures are
appropriate. This is accompanied by a thorough analysis of related work on
system-based performance assessment in Visual Information Retrieval (VIR).
Traditional performance measures are classified into four dimensions and
investigated according to their appropriateness for visual annotation
evaluation. One of the main ideas in this thesis adheres to the common
assumption on the binary nature of the score prediction dimension in
annotation evaluation. However, the predicted concepts and the set of true
indexed concepts interrelate with each other. This work will show how to
utilise these semantic relationships for a fine-grained evaluation
scenario. Outcomes of this thesis result in a user model for concept-based
image retrieval, a fully assessed image annotation test collection, and a
number of novel performance measures for image annotation evaluation
- âŠ