2,401 research outputs found
Automatic Metadata Generation using Associative Networks
In spite of its tremendous value, metadata is generally sparse and
incomplete, thereby hampering the effectiveness of digital information
services. Many of the existing mechanisms for the automated creation of
metadata rely primarily on content analysis which can be costly and
inefficient. The automatic metadata generation system proposed in this article
leverages resource relationships generated from existing metadata as a medium
for propagation from metadata-rich to metadata-poor resources. Because of its
independence from content analysis, it can be applied to a wide variety of
resource media types and is shown to be computationally inexpensive. The
proposed method operates through two distinct phases. Occurrence and
co-occurrence algorithms first generate an associative network of repository
resources leveraging existing repository metadata. Second, using the
associative network as a substrate, metadata associated with metadata-rich
resources is propagated to metadata-poor resources by means of a discrete-form
spreading activation algorithm. This article discusses the general framework
for building associative networks, an algorithm for disseminating metadata
through such networks, and the results of an experiment and validation of the
proposed method using a standard bibliographic dataset
Specimens as research objects: reconciliation across distributed repositories to enable metadata propagation
Botanical specimens are shared as long-term consultable research objects in a
global network of specimen repositories. Multiple specimens are generated from
a shared field collection event; generated specimens are then managed
individually in separate repositories and independently augmented with research
and management metadata which could be propagated to their duplicate peers.
Establishing a data-derived network for metadata propagation will enable the
reconciliation of closely related specimens which are currently dispersed,
unconnected and managed independently. Following a data mining exercise applied
to an aggregated dataset of 19,827,998 specimen records from 292 separate
specimen repositories, 36% or 7,102,710 specimens are assessed to participate
in duplication relationships, allowing the propagation of metadata among the
participants in these relationships, totalling: 93,044 type citations,
1,121,865 georeferences, 1,097,168 images and 2,191,179 scientific name
determinations. The results enable the creation of networks to identify which
repositories could work in collaboration. Some classes of annotation
(particularly those regarding scientific name determinations) represent units
of scientific work: appropriate management of this data would allow the
accumulation of scholarly credit to individual researchers: potential further
work in this area is discussed.Comment: 9 pages, 1 table, 3 figure
Protocols for Scholarly Communication
CERN, the European Organization for Nuclear Research, has operated an
institutional preprint repository for more than 10 years. The repository
contains over 850,000 records of which more than 450,000 are full-text OA
preprints, mostly in the field of particle physics, and it is integrated with
the library's holdings of books, conference proceedings, journals and other
grey literature. In order to encourage effective propagation and open access to
scholarly material, CERN is implementing a range of innovative library services
into its document repository: automatic keywording, reference extraction,
collaborative management tools and bibliometric tools. Some of these services,
such as user reviewing and automatic metadata extraction, could make up an
interesting testbed for future publishing solutions and certainly provide an
exciting environment for e-science possibilities. The future protocol for
scientific communication should naturally guide authors towards OA publication
and CERN wants to help reach a full open access publishing environment for the
particle physics community and the related sciences in the next few years.Comment: 8 pages, to appear in Library and Information Systems in Astronomy
A Multi-Relational Network to Support the Scholarly Communication Process
The general pupose of the scholarly communication process is to support the
creation and dissemination of ideas within the scientific community. At a finer
granularity, there exists multiple stages which, when confronted by a member of
the community, have different requirements and therefore different solutions.
In order to take a researcher's idea from an initial inspiration to a community
resource, the scholarly communication infrastructure may be required to 1)
provide a scientist initial seed ideas; 2) form a team of well suited
collaborators; 3) located the most appropriate venue to publish the formalized
idea; 4) determine the most appropriate peers to review the manuscript; and 5)
disseminate the end product to the most interested members of the community.
Through the various delinieations of this process, the requirements of each
stage are tied soley to the multi-functional resources of the community: its
researchers, its journals, and its manuscritps. It is within the collection of
these resources and their inherent relationships that the solutions to
scholarly communication are to be found. This paper describes an associative
network composed of multiple scholarly artifacts that can be used as a medium
for supporting the scholarly communication process.Comment: keywords: digital libraries and scholarly communicatio
Towards a Cloud-Based Service for Maintaining and Analyzing Data About Scientific Events
We propose the new cloud-based service OpenResearch for managing and
analyzing data about scientific events such as conferences and workshops in a
persistent and reliable way. This includes data about scientific articles,
participants, acceptance rates, submission numbers, impact values as well as
organizational details such as program committees, chairs, fees and sponsors.
OpenResearch is a centralized repository for scientific events and supports
researchers in collecting, organizing, sharing and disseminating information
about scientific events in a structured way. An additional feature currently
under development is the possibility to archive web pages along with the
extracted semantic data in order to lift the burden of maintaining new and old
conference web sites from public research institutions. However, the main
advantage is that this cloud-based repository enables a comprehensive analysis
of conference data. Based on extracted semantic data, it is possible to
determine quality estimations, scientific communities, research trends as well
the development of acceptance rates, fees, and number of participants in a
continuous way complemented by projections into the future. Furthermore, data
about research articles can be systematically explored using a content-based
analysis as well as citation linkage. All data maintained in this
crowd-sourcing platform is made freely available through an open SPARQL
endpoint, which allows for analytical queries in a flexible and user-defined
way.Comment: A completed version of this paper had been accepted in SAVE-SD
workshop 2017 at WWW conferenc
Variation of word frequencies across genre classification tasks
This paper examines automated genre classification of text documents and its role in enabling the effective management of digital documents by digital libraries and other repositories. Genre classification, which narrows down the possible structure of a document, is a valuable step in
realising the general automatic extraction of semantic metadata essential to the efficient management and use of digital objects. In the present report, we present an analysis of word frequencies in different genre classes in an effort to understand the distinction between independent classification tasks. In particular, we examine automated experiments on thirty-one genre classes to determine the relationship between the word frequency metrics and the degree of its significance in carrying out classification in varying environments
Personalized content retrieval in context using ontological knowledge
Personalized content retrieval aims at improving the retrieval process by taking into account the particular interests of individual users. However, not all user preferences are relevant in all situations. It is well known that human preferences are complex, multiple, heterogeneous, changing, even contradictory, and should be understood in context with the user goals and tasks at hand. In this paper, we propose a method to build a dynamic representation of the semantic context of ongoing retrieval tasks, which is used to activate different subsets of user interests at runtime, in a way that out-of-context preferences are discarded. Our approach is based on an ontology-driven representation of the domain of discourse, providing enriched descriptions of the semantics involved in retrieval actions and preferences, and enabling the definition of effective means to relate preferences and context
- …