88,427 research outputs found

    Discovering Links for Metadata Enrichment on Computer Science Papers

    Full text link
    At the very beginning of compiling a bibliography, usually only basic information, such as title, authors and publication date of an item are known. In order to gather additional information about a specific item, one typically has to search the library catalog or use a web search engine. This look-up procedure implies a manual effort for every single item of a bibliography. In this technical report we present a proof of concept which utilizes Linked Data technology for the simple enrichment of sparse metadata sets. This is done by discovering owl:sameAs links be- tween an initial set of computer science papers and resources from external data sources like DBLP, ACM and the Semantic Web Conference Corpus. In this report, we demonstrate how the link discovery tool Silk is used to detect additional information and to enrich an initial set of records in the computer science domain. The pros and cons of silk as link discovery tool are summarized in the end.Comment: 22 pages, 4 figures, 7 listings, presented at SWIB1

    The Hidden Web, XML and Semantic Web: A Scientific Data Management Perspective

    Get PDF
    The World Wide Web no longer consists just of HTML pages. Our work sheds light on a number of trends on the Internet that go beyond simple Web pages. The hidden Web provides a wealth of data in semi-structured form, accessible through Web forms and Web services. These services, as well as numerous other applications on the Web, commonly use XML, the eXtensible Markup Language. XML has become the lingua franca of the Internet that allows customized markups to be defined for specific domains. On top of XML, the Semantic Web grows as a common structured data source. In this work, we first explain each of these developments in detail. Using real-world examples from scientific domains of great interest today, we then demonstrate how these new developments can assist the managing, harvesting, and organization of data on the Web. On the way, we also illustrate the current research avenues in these domains. We believe that this effort would help bridge multiple database tracks, thereby attracting researchers with a view to extend database technology.Comment: EDBT - Tutorial (2011

    Creating Reusable Educational Components: Lessons from DLESE

    Get PDF
    Reuse of educational materials is integral to many educator tasks, from designing a course to preparing for a lab or class. This article describes a study on the reuse of educational materials in the context of the Digital Library for Earth System Education (DLESE), a community-owned and governed facility offering high-quality teaching and learning resources for Earth system education. The study noted that educational resource designers often do not develop components with reuse in mind, making it more difficult or impossible for other educators to find and use their material, and that the 'findability' and reusability of community-created digital educational resources is highly dependent on the presentational and structural design of the resources themselves. The authors recommend that all resources clearly state the creator's name and contact information, relevant copyright restrictions, the most significant date for the resource (specifying creation or revision), and the intended grade level. Educational levels: Graduate or professional, Graduate or professional, Graduate or professional

    BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking

    Full text link
    Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4V requirements of big data. Specifically, big data generators need to generate scalable data (Volume) of different types (Variety) under controllable generation rates (Velocity) while keeping the important characteristics of raw data (Veracity). This gives rise to various new challenges about how we design generators efficiently and successfully. To date, most existing techniques can only generate limited types of data and support specific big data systems such as Hadoop. Hence we develop a tool, called Big Data Generator Suite (BDGS), to efficiently generate scalable big data while employing data models derived from real data to preserve data veracity. The effectiveness of BDGS is demonstrated by developing six data generators covering three representative data types (structured, semi-structured and unstructured) and three data sources (text, graph, and table data)

    STARGATE : Static Repository Gateway and Toolkit. Final Project Report

    Get PDF
    STARGATE (Static Repository Gateway and Toolkit) was funded by the Joint Information Systems Committee (JISC) and is intended to demonstrate the ease of use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Static Repository technology, and the potential benefits offered to publishers in making their metadata available in this way This technology offers a simpler method of participating in many information discovery services than creating fully-fledged OAI-compliant repositories. It does this by allowing the infrastructure and technical support required to participate in OAI-based services to be shifted from the data provider (the journal) to a third party and allows a single third party gateway provider to provide intermediation for many data providers (journals). Specifically, STARGATE has created a series of Static Repositories of publisher metadata provided by a selection of Library and Information Science journals. It has demonstrated the interoperability of these repositories by exposing their metadata via a Static Repository Gateway for harvesting and cross-searching by external service providers. The project has conducted a critical evaluation of the Static Repository approach in conjunction with the participating publishers and service providers. The technology works. The project has demonstrated that Static Repositories are easy to create and that the differences between fully-fledged and static OAI Repositories have no impact on the participation of small journal publishers in OAI-based services. The problems for a service that arise out of the use of Static Repositories are parallel to those created by any other repository dealing with journal articles. Problems arise from the diversity of metadata element sets provided by a given journal and the lack of specific metadata elements for the articles' volume and issue details. Another issue for the use of publishers' metadata arise as the collection policies of some existing services only allow Open Access materials to be included in them. The project recommends that the use of Static Repositories continues to be explored - in particular as a flexible way to expose existing sets of structured information to OAI services and to create the opportunity to enhance the metadata as part of the process. The project further recommends that the publishing community consider the creation or adoption of an application profile for journal articles to support information discovery that can search by volume and issue. Significant further use of the Static Repository technology by small journal publishers will require the future creation and maintenance of a community-specific Static Repository Gateway. Further use will also require advocacy within the publishing community but might initially be most effectively kick-started through the creation of OAI repositories based on metadata held by the commercial services which publish or mediate access to electronic copies of journals on behalf of small publishers

    A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

    Get PDF
    • …
    corecore