111,736 research outputs found

    Expressing the tacit knowledge of a digital library system as linked data

    Get PDF
    Library organizations have enthusiastically undertaken semantic web initiatives and in particular the data publishing as linked data. Nevertheless, different surveys report the experimental nature of initiatives and the consumer difficulty in re-using data. These barriers are a hindrance for using linked datasets, as an infrastructure that enhances the library and related information services. This paper presents an approach for encoding, as a Linked Vocabulary, the "tacit" knowledge of the information system that manages the data source. The objective is the improvement of the interpretation process of the linked data meaning of published datasets. We analyzed a digital library system, as a case study, for prototyping the "semantic data management" method, where data and its knowledge are natively managed, taking into account the linked data pillars. The ultimate objective of the semantic data management is to curate the correct consumers' interpretation of data, and to facilitate the proper re-use. The prototype defines the ontological entities representing the knowledge, of the digital library system, that is not stored in the data source, nor in the existing ontologies related to the system's semantics. Thus we present the local ontology and its matching with existing ontologies, Preservation Metadata Implementation Strategies (PREMIS) and Metadata Objects Description Schema (MODS), and we discuss linked data triples prototyped from the legacy relational database, by using the local ontology. We show how the semantic data management, can deal with the inconsistency of system data, and we conclude that a specific change in the system developer mindset, it is necessary for extracting and "codifying" the tacit knowledge, which is necessary to improve the data interpretation process

    A Graph-structured Dataset for Wikipedia Research

    Get PDF
    Wikipedia is a rich and invaluable source of information. Its central place on the Web makes it a particularly interesting object of study for scientists. Researchers from different domains used various complex datasets related to Wikipedia to study language, social behavior, knowledge organization, and network theory. While being a scientific treasure, the large size of the dataset hinders pre-processing and may be a challenging obstacle for potential new studies. This issue is particularly acute in scientific domains where researchers may not be technically and data processing savvy. On one hand, the size of Wikipedia dumps is large. It makes the parsing and extraction of relevant information cumbersome. On the other hand, the API is straightforward to use but restricted to a relatively small number of requests. The middle ground is at the mesoscopic scale when researchers need a subset of Wikipedia ranging from thousands to hundreds of thousands of pages but there exists no efficient solution at this scale. In this work, we propose an efficient data structure to make requests and access subnetworks of Wikipedia pages and categories. We provide convenient tools for accessing and filtering viewership statistics or "pagecounts" of Wikipedia web pages. The dataset organization leverages principles of graph databases that allows rapid and intuitive access to subgraphs of Wikipedia articles and categories. The dataset and deployment guidelines are available on the LTS2 website \url{https://lts2.epfl.ch/Datasets/Wikipedia/}

    Physicists Thriving with Paperless Publishing

    Full text link
    The Stanford Linear Accelerator Center (SLAC) and Deutsches Elektronen Synchrotron (DESY) libraries have been comprehensively cataloguing the High Energy Particle Physics (HEP) literature online since 1974. The core database, SPIRES-HEP, now indexes over 400,000 research articles, with almost 50% linked to fulltext electronic versions (this site now has over 15 000 hits per day). This database motivated the creation of the first site in the United States for the World Wide Web at SLAC. With this database and the invention of the Los Alamos E-print archives in 1991, the HEP community pioneered the trend to "paperless publishing" and the trend to paperless access; in other words, the "virtual library." We examine the impact this has had both on the way scientists research and on paper-based publishing. The standard of work archived at Los Alamos is very high. 70% of papers are eventually published in journals and another 20% are in conference proceedings. As a service to authors, the SPIRES-HEP collaboration has been ensuring that as much information as possible is included with each bibliographic entry for a paper. Such meta-data can include tables of the experimental data that researchers can easily use to perform their own analyses as well as detailed descriptions of the experiment, citation tracking, and links to full-text documents.Comment: 17 pages, Invited talk at the AAAS Meeting, February 2000 in Washington, D

    Statistical analysis of the owl:sameAs network for aligning concepts in the linking open data cloud

    No full text
    The massively distributed publication of linked data has brought to the attention of scientific community the limitations of classic methods for achieving data integration and the opportunities of pushing the boundaries of the field by experimenting this collective enterprise that is the linking open data cloud. While reusing existing ontologies is the choice of preference, the exploitation of ontology alignments still is a required step for easing the burden of integrating heterogeneous data sets. Alignments, even between the most used vocabularies, is still poorly supported in systems nowadays whereas links between instances are the most widely used means for bridging the gap between different data sets. We provide in this paper an account of our statistical and qualitative analysis of the network of instance level equivalences in the Linking Open Data Cloud (i.e. the sameAs network) in order to automatically compute alignments at the conceptual level. Moreover, we explore the effect of ontological information when adopting classical Jaccard methods to the ontology alignment task. Automating such task will allow in fact to achieve a clearer conceptual description of the data at the cloud level, while improving the level of integration between datasets. <br/

    Consuming Linked Closed Data

    No full text
    The growth of the Linked Data corpus will eventually pre- vent all but the most determined of consumers from including every Linked Dataset in a single undertaking. In addition, we anticipate that the need for effective revenue models for Linked Data publishing will spur the rise of Linked Closed Data, where access to datasets is restricted. We argue that these impeding changes necessitate an overhaul of our current practices for consuming Linked Data. To this end, we propose a model for consuming Linked Data, built on the notion of continuous Information Quality assessment, which brings together a range of existing research and highlights a number of avenues for future work

    Local Type Checking for Linked Data Consumers

    Get PDF
    The Web of Linked Data is the cumulation of over a decade of work by the Web standards community in their effort to make data more Web-like. We provide an introduction to the Web of Linked Data from the perspective of a Web developer that would like to build an application using Linked Data. We identify a weakness in the development stack as being a lack of domain specific scripting languages for designing background processes that consume Linked Data. To address this weakness, we design a scripting language with a simple but appropriate type system. In our proposed architecture some data is consumed from sources outside of the control of the system and some data is held locally. Stronger type assumptions can be made about the local data than external data, hence our type system mixes static and dynamic typing. Throughout, we relate our work to the W3C recommendations that drive Linked Data, so our syntax is accessible to Web developers.Comment: In Proceedings WWV 2013, arXiv:1308.026

    Wikipedia as an encyclopaedia of life

    Get PDF
    In his 2003 essay E O Wilson outlined his vision for an &#x201c;encyclopaedia of life&#x201d; comprising &#x201c;an electronic page for each species of organism on Earth&#x201d;, each page containing &#x201c;the scientific name of the species, a pictorial or genomic presentation of the primary type specimen on which its name is based, and a summary of its diagnostic traits.&#x201d; Although the &#x201c;quiet revolution&#x201d; in biodiversity informatics has generated numerous online resources, including some directly inspired by Wilson&#x27;s essay (e.g., &#x22;http://ispecies.org&#x22;:http://ispecies.org, &#x22;http://www.eol.org&#x22;:http://www.eol.org), we are still some way from the goal of having available online all relevant information about a species, such as its taxonomy, evolutionary history, genomics, morphology, ecology, and behaviour. While the biodiversity community has been developing a plethora of databases, some with overlapping goals and duplicated content, Wikipedia has been slowly growing to the point where it now has over 100,000 pages on biological taxa. My goal in this essay is to explore the idea that, largely independent of the efforts of biodiversity informatics and well-funded international efforts, Wikipedia (&#x22;http://en.wikipedia.org/wiki/Main_Page&#x22;:http://en.wikipedia.org/wiki/Main_Page) has emerged as potentially the best platform for fulfilling E O Wilson&#x2019;s vision

    Addressing the tacit knowledge of a digital library system

    Get PDF
    Recent surveys, about the Linked Data initiatives in library organizations, report the experimental nature of related projects and the difficulty in re-using data to provide improvements of library services. This paper presents an approach for managing data and its "tacit" organizational knowledge, as the originating data context, improving the interpretation of data meaning. By analyzing a Digital Libray system, we prototyped a method for turning data management into a "semantic data management", where local system knowledge is managed as a data, and natively foreseen as a Linked Data. Semantic data management aims to curates the correct consumers' understanding of Linked Datasets, driving to a proper re-use
    • …
    corecore