24,335 research outputs found
On Constructing Persistent Identifiers with Persistent Resolution Targets
Persistent Identifiers (PID) are the foundation referencing digital assets in
scientific publications, books, and digital repositories. In its realization,
PIDs contain metadata and resolving targets in form of URLs that point to data
sets located on the network. In contrast to PIDs, the target URLs are typically
changing over time; thus, PIDs need continuous maintenance -- an effort that is
increasing tremendously with the advancement of e-Science and the advent of the
Internet-of-Things (IoT). Nowadays, billions of sensors and data sets are
subject of PID assignment. This paper presents a new approach of embedding
location independent targets into PIDs that allows the creation of
maintenance-free PIDs using content-centric network technology and overlay
networks. For proving the validity of the presented approach, the Handle PID
System is used in conjunction with Magnet Link access information encoding,
state-of-the-art decentralized data distribution with BitTorrent, and Named
Data Networking (NDN) as location-independent data access technology for
networks. Contrasting existing approaches, no green-field implementation of PID
or major modifications of the Handle System is required to enable
location-independent data dissemination with maintenance-free PIDs.Comment: Published IEEE paper of the FedCSIS 2016 (SoFAST-WS'16) conference,
11.-14. September 2016, Gdansk, Poland. Also available online:
http://ieeexplore.ieee.org/document/7733372
Identity in research infrastructure and scientific communication: Report from the 1st IRISC workshop, Helsinki Sep 12-13, 2011
Motivation for the IRISC workshop came from the observation that identity and digital identification are increasingly important factors in modern scientific research, especially with the now near-ubiquitous use of the Internet as a global medium for dissemination and debate of scientific knowledge and data, and as a platform for scientific collaborations and large-scale e-science activities.

The 1 1/2 day IRISC2011 workshop sought to explore a series of interrelated topics under two main themes: i) unambiguously identifying authors/creators & attributing their scholarly works, and ii) individual identification and access management in the context of identity federations. Specific aims of the workshop included:

• Raising overall awareness of key technical and non-technical challenges, opportunities and developments.
• Facilitating a dialogue, cross-pollination of ideas, collaboration and coordination between diverse – and largely unconnected – communities.
• Identifying & discussing existing/emerging technologies, best practices and requirements for researcher identification.

This report provides background information on key identification-related concepts & projects, describes workshop proceedings and summarizes key workshop findings
Linking Literature and Data: Status Report and Future Efforts
In the current era of data-intensive science, it is increasingly important
for researchers to be able to have access to published results, the supporting
data, and the processes used to produce them. Six years ago, recognizing this
need, the American Astronomical Society and the Astrophysics Data Centers
Executive Committee (ADEC) sponsored an effort to facilitate the annotation and
linking of datasets during the publishing process, with limited success. I will
review the status of this effort and describe a new, more general one now being
considered in the context of the Virtual Astronomical Observatory.Comment: 9 pages, 2 figures, to appear in: Future Professional Communication
in Astronomy II (FPCA-II
A Peer-to-Peer Middleware Framework for Resilient Persistent Programming
The persistent programming systems of the 1980s offered a programming model
that integrated computation and long-term storage. In these systems, reliable
applications could be engineered without requiring the programmer to write
translation code to manage the transfer of data to and from non-volatile
storage. More importantly, it simplified the programmer's conceptual model of
an application, and avoided the many coherency problems that result from
multiple cached copies of the same information. Although technically
innovative, persistent languages were not widely adopted, perhaps due in part
to their closed-world model. Each persistent store was located on a single
host, and there were no flexible mechanisms for communication or transfer of
data between separate stores. Here we re-open the work on persistence and
combine it with modern peer-to-peer techniques in order to provide support for
orthogonal persistence in resilient and potentially long-running distributed
applications. Our vision is of an infrastructure within which an application
can be developed and distributed with minimal modification, whereupon the
application becomes resilient to certain failure modes. If a node, or the
connection to it, fails during execution of the application, the objects are
re-instantiated from distributed replicas, without their reference holders
being aware of the failure. Furthermore, we believe that this can be achieved
within a spectrum of application programmer intervention, ranging from minimal
to totally prescriptive, as desired. The same mechanisms encompass an
orthogonally persistent programming model. We outline our approach to
implementing this vision, and describe current progress.Comment: Submitted to EuroSys 200
Biodiversity informatics: the challenge of linking data and the role of shared identifiers
A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers (such as DOIs and LSIDs), and the implementation of services that link those identifiers
Pathways: Augmenting interoperability across scholarly repositories
In the emerging eScience environment, repositories of papers, datasets,
software, etc., should be the foundation of a global and natively-digital
scholarly communications system. The current infrastructure falls far short of
this goal. Cross-repository interoperability must be augmented to support the
many workflows and value-chains involved in scholarly communication. This will
not be achieved through the promotion of single repository architecture or
content representation, but instead requires an interoperability framework to
connect the many heterogeneous systems that will exist.
We present a simple data model and service architecture that augments
repository interoperability to enable scholarly value-chains to be implemented.
We describe an experiment that demonstrates how the proposed infrastructure can
be deployed to implement the workflow involved in the creation of an overlay
journal over several different repository systems (Fedora, aDORe, DSpace and
arXiv).Comment: 18 pages. Accepted for International Journal on Digital Libraries
special issue on Digital Libraries and eScienc
ATLAS: A flexible and extensible architecture for linguistic annotation
We describe a formal model for annotating linguistic artifacts, from which we
derive an application programming interface (API) to a suite of tools for
manipulating these annotations. The abstract logical model provides for a range
of storage formats and promotes the reuse of tools that interact through this
API. We focus first on ``Annotation Graphs,'' a graph model for annotations on
linear signals (such as text and speech) indexed by intervals, for which
efficient database storage and querying techniques are applicable. We note how
a wide range of existing annotated corpora can be mapped to this annotation
graph model. This model is then generalized to encompass a wider variety of
linguistic ``signals,'' including both naturally occuring phenomena (as
recorded in images, video, multi-modal interactions, etc.), as well as the
derived resources that are increasingly important to the engineering of natural
language processing systems (such as word lists, dictionaries, aligned
bilingual corpora, etc.). We conclude with a review of the current efforts
towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure
- …