27,692 research outputs found
Is Evaluating Visual Search Interfaces in Digital Libraries Still an Issue?
Although various visual interfaces for digital libraries have been developed
in prototypical systems, very few of these visual approaches have been
integrated into today's digital libraries. In this position paper we argue that
this is most likely due to the fact that the evaluation results of most visual
systems lack comparability. There is no fix standard on how to evaluate visual
interactive user interfaces. Therefore it is not possible to identify which
approach is more suitable for a certain context. We feel that the comparability
of evaluation results could be improved by building a common evaluation setup
consisting of a reference system, based on a standardized corpus with fixed
tasks and a panel for possible participants.Comment: 10 pages, 2 figures, LWA Workshop 201
Variation of word frequencies across genre classification tasks
This paper examines automated genre classification of text documents and its role in enabling the effective management of digital documents by digital libraries and other repositories. Genre classification, which narrows down the possible structure of a document, is a valuable step in
realising the general automatic extraction of semantic metadata essential to the efficient management and use of digital objects. In the present report, we present an analysis of word frequencies in different genre classes in an effort to understand the distinction between independent classification tasks. In particular, we examine automated experiments on thirty-one genre classes to determine the relationship between the word frequency metrics and the degree of its significance in carrying out classification in varying environments
Feature Type Analysis in Automated Genre Classification
In this paper, we compare classifiers based on language model, image, and stylistic features for automated genre classification. The majority of previous studies in genre classification have created models based on an amalgamated representation of a document using a multitude of features. In these models, the inseparable roles of different features make it difficult to determine a means of improving the classifier when it exhibits poor performance in detecting selected genres. By independently modeling and comparing classifiers based on features belonging to three types, describing visual, stylistic, and topical properties, we demonstrate that different genres have distinctive feature strengths.
Building a document genre corpus: a profile of the KRYS I corpus
This paper describes the KRYS I corpus, consisting of documents classified into 70 genre classes. It has
been constructed as part of an effort to automate document genre classification as distinct from topic
detection. Previously there has been very little work on building corpora of texts which have been classified
using a nontopical
genre palette. The reason for this is partly due to the fact that genre as a concept, is
rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation
([11]). The usefulness of genre in everyday information search is only now starting to be recognised and
there is no genre classification schema that has been consolidated to have applicable value in this direction.
By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the
information gathering and seeking behaviour and the role of genre in these activities, as well as a way
forward for creating a better corpus for testing automated genre classification tasks and the application of
these tasks to other domains.
Generating collaborative systems for digital libraries: A model-driven approach
This is an open access article shared under a Creative Commons Attribution 3.0 Licence (http://creativecommons.org/licenses/by/3.0/). Copyright @ 2010 The Authors.The design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negotiate the services the library has to offer. To this end, high-level, language-neutral models have to be devised. Metamodeling techniques favor the definition of domainspecific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. This paper describes CRADLE (Cooperative-Relational Approach to Digital Library Environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. A collection of tools allows the automatic generation of several services, defined with the CRADLE visual language, and of the graphical user interfaces providing access to them for the final user. The effectiveness of the approach is illustrated by presenting digital libraries generated with CRADLE, while the CRADLE environment has been evaluated by using the cognitive dimensions framework
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Query Expansion for Survey Question Retrieval in the Social Sciences
In recent years, the importance of research data and the need to archive and
to share it in the scientific community have increased enormously. This
introduces a whole new set of challenges for digital libraries. In the social
sciences typical research data sets consist of surveys and questionnaires. In
this paper we focus on the use case of social science survey question reuse and
on mechanisms to support users in the query formulation for data sets. We
describe and evaluate thesaurus- and co-occurrence-based approaches for query
expansion to improve retrieval quality in digital libraries and research data
archives. The challenge here is to translate the information need and the
underlying sociological phenomena into proper queries. As we can show retrieval
quality can be improved by adding related terms to the queries. In a direct
comparison automatically expanded queries using extracted co-occurring terms
can provide better results than queries manually reformulated by a domain
expert and better results than a keyword-based BM25 baseline.Comment: to appear in Proceedings of 19th International Conference on Theory
and Practice of Digital Libraries 2015 (TPDL 2015
Building a Document Genre Corpus: a Profile of the KRYS I Corpus
This paper describes the KRYS I corpus (http://www.krys-corpus.eu/Info.html), consisting of documents classified into 70 genre classes. It has been constructed as part of an effort to automate document genre classification as distinct from topic detection. Previously there has been very little work on building corpora of texts which have been classified using a non-topical genre palette. The reason for this is partly due to the fact that genre as a concept, is rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation ([11]). The usefulness of genre in everyday information search is only now starting to be recognised and there is no genre classification schema that has been consolidated to have applicable value in this direction. By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the information gathering and seeking behaviour and the role of genre in these activities, as well as a way forward for creating a better corpus for testing automated genre classification tasks and the application of these tasks to other domains
- âŠ