1,520 research outputs found
Linked Data - the story so far
The term “Linked Data” refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions— the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Linked Data Science Powered by Knowledge Graphs
In recent years, we have witnessed a growing interest in data science not
only from academia but particularly from companies investing in data science
platforms to analyze large amounts of data. In this process, a myriad of data
science artifacts, such as datasets and pipeline scripts, are created. Yet,
there has so far been no systematic attempt to holistically exploit the
collected knowledge and experiences that are implicitly contained in the
specification of these pipelines, e.g., compatible datasets, cleansing steps,
ML algorithms, parameters, etc. Instead, data scientists still spend a
considerable amount of their time trying to recover relevant information and
experiences from colleagues, trial and error, lengthy exploration, etc. In this
paper, we, therefore, propose a scalable system (KGLiDS) that employs machine
learning to extract the semantics of data science pipelines and captures them
in a knowledge graph, which can then be exploited to assist data scientists in
various ways. This abstraction is the key to enabling Linked Data Science since
it allows us to share the essence of pipelines between platforms, companies,
and institutions without revealing critical internal information and instead
focusing on the semantics of what is being processed and how. Our comprehensive
evaluation uses thousands of datasets and more than thirteen thousand pipeline
scripts extracted from data discovery benchmarks and the Kaggle portal and
shows that KGLiDS significantly outperforms state-of-the-art systems on related
tasks, such as dataset recommendation and pipeline classification.Comment: 11 pages, 6 figure
Report of the Stanford Linked Data Workshop
The Stanford University Libraries and Academic Information Resources (SULAIR) with the Council on Library and Information Resources (CLIR) conducted at week-long workshop on the prospects for a large scale, multi-national, multi-institutional prototype of a Linked Data environment for discovery of and navigation among the rapidly, chaotically expanding array of academic information resources. As preparation for the workshop, CLIR sponsored a survey by Jerry Persons, Chief Information Architect emeritus of SULAIR that was published originally for workshop participants as background to the workshop and is now publicly available. The original intention of the workshop was to devise a plan for such a prototype. However, such was the diversity of knowledge, experience, and views of the potential of Linked Data approaches that the workshop participants turned to two more fundamental goals: building common understanding and enthusiasm on the one hand and identifying opportunities and challenges to be confronted in the preparation of the intended prototype and its operation on the other. In pursuit of those objectives, the workshop participants produced:1. a value statement addressing the question of why a Linked Data approach is worth prototyping;2. a manifesto for Linked Libraries (and Museums and Archives and …);3. an outline of the phases in a life cycle of Linked Data approaches;4. a prioritized list of known issues in generating, harvesting & using Linked Data;5. a workflow with notes for converting library bibliographic records and other academic metadata to URIs;6. examples of potential “killer apps” using Linked Data: and7. a list of next steps and potential projects.This report includes a summary of the workshop agenda, a chart showing the use of Linked Data in cultural heritage venues, and short biographies and statements from each of the participants
Privacy-Preserving Reengineering of Model-View-Controller Application Architectures Using Linked Data
When a legacy system’s software architecture cannot be redesigned, implementing
additional privacy requirements is often complex, unreliable and
costly to maintain. This paper presents a privacy-by-design approach to
reengineer web applications as linked data-enabled and implement access
control and privacy preservation properties. The method is based on the
knowledge of the application architecture, which for the Web of data is
commonly designed on the basis of a model-view-controller pattern. Whereas
wrapping techniques commonly used to link data of web applications duplicate
the security source code, the new approach allows for the controlled
disclosure of an application’s data, while preserving non-functional properties
such as privacy preservation. The solution has been implemented
and compared with existing linked data frameworks in terms of reliability,
maintainability and complexity
- …