906 research outputs found
ImageSieve: Exploratory search of museum archives with named entity-based faceted browsing
Over the last few years, faceted search emerged as an attractive alternative to the traditional "text box" search and has become one of the standard ways of interaction on many e-commerce sites. However, these applications of faceted search are limited to domains where the objects of interests have already been classified along several independent dimensions, such as price, year, or brand. While automatic approaches to generate faceted search interfaces were proposed, it is not yet clear to what extent the automatically-produced interfaces will be useful to real users, and whether their quality can match or surpass their manually-produced predecessors. The goal of this paper is to introduce an exploratory search interface called ImageSieve, which shares many features with traditional faceted browsing, but can function without the use of traditional faceted metadata. ImageSieve uses automatically extracted and classified named entities, which play important roles in many domains (such as news collections, image archives, etc.). We describe one specific application of ImageSieve for image search. Here, named entities extracted from the descriptions of the retrieved images are used to organize a faceted browsing interface, which then helps users to make sense of and further explore the retrieved images. The results of a user study of ImageSieve demonstrate that a faceted search system based on named entities can help users explore large collections and find relevant information more effectively
Virtual language observatory: The portal to the language resources and technology universe
Over the years, the field of Language Resources and Technology (LRT) hasdeveloped a tremendous amount of resources and tools. However, there is noready-to-use map that researchers could use to gain a good overview andsteadfast orientation when searching for, say corpora or software tools tosupport their studies. It is rather the case that information is scatteredacross project- or organisation-specific sites, which makes it hard if notimpossible for less-experienced researchers to gather all relevant material.Clearly, the provision of metadata is central to resource and softwareexploration. However, in the LRT field, metadata comes in many forms, tastesand qualities, and therefore substantial harmonization and curation efforts arerequired to provide researchers with metadata-based guidance. To address thisissue a broad alliance of LRT providers (CLARIN, the Linguist List, DOBES,DELAMAN, DFKI, ELRA) have initiated the Virtual Language Observatory portal toprovide a low-barrier, easy-to-follow entry point to language resources andtools; it can be accessed via http://www.clarin.eu/vl
Search in the eye of the beholder: using the personal social dataset and ontology-guided input to improve web search efficiency
Proceedings of: Latin American Web Conference 2007 (LA-WEB 2007), 31 October-2 November 2007, Santiago (Chile)Among the challenges of searching the vast information source the Web has become, improving Web search efficiency by different strategies using semantics and the user generated data from Web 2.0 applications remains a promising and interesting approach. In this paper, we present the Personal Social Dataset and Ontology-guided Input strategies and couple them together, providing a proof of concept implementation.Publicad
Theory and Practice of Data Citation
Citations are the cornerstone of knowledge propagation and the primary means
of assessing the quality of research, as well as directing investments in
science. Science is increasingly becoming "data-intensive", where large volumes
of data are collected and analyzed to discover complex patterns through
simulations and experiments, and most scientific reference works have been
replaced by online curated datasets. Yet, given a dataset, there is no
quantitative, consistent and established way of knowing how it has been used
over time, who contributed to its curation, what results have been yielded or
what value it has.
The development of a theory and practice of data citation is fundamental for
considering data as first-class research objects with the same relevance and
centrality of traditional scientific products. Many works in recent years have
discussed data citation from different viewpoints: illustrating why data
citation is needed, defining the principles and outlining recommendations for
data citation systems, and providing computational methods for addressing
specific issues of data citation.
The current panorama is many-faceted and an overall view that brings together
diverse aspects of this topic is still missing. Therefore, this paper aims to
describe the lay of the land for data citation, both from the theoretical (the
why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association
for Information Science and Technology (JASIST), 201
Lightweight Ontologies
Ontologies are explicit specifications of conceptualizations. They are often thought of as directed graphs whose nodes represent concepts and whose edges represent relations between concepts. The notion of concept is understood as defined in Knowledge Representation, i.e., as a set of objects or individuals. This set is called the concept extension or the concept interpretation. Concepts are often lexically defined, i.e., they have natural language names which are used to describe the concept extensions (e.g., concept mother denotes the set of all female parents). Therefore, when ontologies are visualized, their nodes are often shown with corresponding natural language concept names. The backbone structure of the ontology graph is a taxonomy in which the relations are âis-aâ, whereas the remaining structure of the graph supplies auxiliary information about the modeled domain and may include relations like âpart-ofâ, âlocated-inâ, âis-parent-ofâ, and many others
Facilitating design learning through faceted classification of in-service information
The maintenance and service records collected and maintained by engineering companies are a useful
resource for the ongoing support of products. Such records are typically semi-structured and contain
key information such as a description of the issue and the product affected. It is suggested that further
value can be realised from the collection of these records for indicating recurrent and systemic issues
which may not have been apparent previously. This paper presents a faceted classification approach to
organise the information collection that might enhance retrieval and also facilitate learning from in-service
experiences. The faceted classification may help to expedite responses to urgent in-service issues as
well as to allow for patterns and trends in the records to be analysed, either automatically using suitable
data mining algorithms or by manually browsing the classification tree. The paper describes the application
of the approach to aerospace in-service records, where the potential for knowledge discovery is
demonstrated
- âŠ