88,427 research outputs found
Discovering Links for Metadata Enrichment on Computer Science Papers
At the very beginning of compiling a bibliography, usually only basic
information, such as title, authors and publication date of an item are known.
In order to gather additional information about a specific item, one typically
has to search the library catalog or use a web search engine. This look-up
procedure implies a manual effort for every single item of a bibliography. In
this technical report we present a proof of concept which utilizes Linked Data
technology for the simple enrichment of sparse metadata sets. This is done by
discovering owl:sameAs links be- tween an initial set of computer science
papers and resources from external data sources like DBLP, ACM and the Semantic
Web Conference Corpus. In this report, we demonstrate how the link discovery
tool Silk is used to detect additional information and to enrich an initial set
of records in the computer science domain. The pros and cons of silk as link
discovery tool are summarized in the end.Comment: 22 pages, 4 figures, 7 listings, presented at SWIB1
The Hidden Web, XML and Semantic Web: A Scientific Data Management Perspective
The World Wide Web no longer consists just of HTML pages. Our work sheds
light on a number of trends on the Internet that go beyond simple Web pages.
The hidden Web provides a wealth of data in semi-structured form, accessible
through Web forms and Web services. These services, as well as numerous other
applications on the Web, commonly use XML, the eXtensible Markup Language. XML
has become the lingua franca of the Internet that allows customized markups to
be defined for specific domains. On top of XML, the Semantic Web grows as a
common structured data source. In this work, we first explain each of these
developments in detail. Using real-world examples from scientific domains of
great interest today, we then demonstrate how these new developments can assist
the managing, harvesting, and organization of data on the Web. On the way, we
also illustrate the current research avenues in these domains. We believe that
this effort would help bridge multiple database tracks, thereby attracting
researchers with a view to extend database technology.Comment: EDBT - Tutorial (2011
Creating Reusable Educational Components: Lessons from DLESE
Reuse of educational materials is integral to many educator tasks, from designing a course to preparing for a lab or class. This article describes a study on the reuse of educational materials in the context of the Digital Library for Earth System Education (DLESE), a community-owned and governed facility offering high-quality teaching and learning resources for Earth system education. The study noted that educational resource designers often do not develop components with reuse in mind, making it more difficult or impossible for other educators to find and use their material, and that the 'findability' and reusability of community-created digital educational resources is highly dependent on the presentational and structural design of the resources themselves. The authors recommend that all resources clearly state the creator's name and contact information, relevant copyright restrictions, the most significant date for the resource (specifying creation or revision), and the intended grade level. Educational levels: Graduate or professional, Graduate or professional, Graduate or professional
BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking
Data generation is a key issue in big data benchmarking that aims to generate
application-specific data sets to meet the 4V requirements of big data.
Specifically, big data generators need to generate scalable data (Volume) of
different types (Variety) under controllable generation rates (Velocity) while
keeping the important characteristics of raw data (Veracity). This gives rise
to various new challenges about how we design generators efficiently and
successfully. To date, most existing techniques can only generate limited types
of data and support specific big data systems such as Hadoop. Hence we develop
a tool, called Big Data Generator Suite (BDGS), to efficiently generate
scalable big data while employing data models derived from real data to
preserve data veracity. The effectiveness of BDGS is demonstrated by developing
six data generators covering three representative data types (structured,
semi-structured and unstructured) and three data sources (text, graph, and
table data)
STARGATE : Static Repository Gateway and Toolkit. Final Project Report
STARGATE (Static Repository Gateway and Toolkit) was funded by the Joint Information Systems Committee (JISC) and is intended to demonstrate the ease of use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Static Repository technology, and the potential benefits offered to publishers in making their metadata available in this way This technology offers a simpler method of participating in many information discovery services than creating fully-fledged OAI-compliant repositories. It does this by allowing the infrastructure and technical support required to participate in OAI-based services to be shifted from the data provider (the journal) to a third party and allows a single third party gateway provider to provide intermediation for many data providers (journals). Specifically, STARGATE has created a series of Static Repositories of publisher metadata provided by a selection of Library and Information Science journals. It has demonstrated the interoperability of these repositories by exposing their metadata via a Static Repository Gateway for harvesting and cross-searching by external service providers. The project has conducted a critical evaluation of the Static Repository approach in conjunction with the participating publishers and service providers. The technology works. The project has demonstrated that Static Repositories are easy to create and that the differences between fully-fledged and static OAI Repositories have no impact on the participation of small journal publishers in OAI-based services. The problems for a service that arise out of the use of Static Repositories are parallel to those created by any other repository dealing with journal articles. Problems arise from the diversity of metadata element sets provided by a given journal and the lack of specific metadata elements for the articles' volume and issue details. Another issue for the use of publishers' metadata arise as the collection policies of some existing services only allow Open Access materials to be included in them. The project recommends that the use of Static Repositories continues to be explored - in particular as a flexible way to expose existing sets of structured information to OAI services and to create the opportunity to enhance the metadata as part of the process. The project further recommends that the publishing community consider the creation or adoption of an application profile for journal articles to support information discovery that can search by volume and issue. Significant further use of the Static Repository technology by small journal publishers will require the future creation and maintenance of a community-specific Static Repository Gateway. Further use will also require advocacy within the publishing community but might initially be most effectively kick-started through the creation of OAI repositories based on metadata held by the commercial services which publish or mediate access to electronic copies of journals on behalf of small publishers
- …