4 research outputs found
From Linked Data to Relevant Data -- Time is the Essence
The Semantic Web initiative puts emphasis not primarily on putting data on
the Web, but rather on creating links in a way that both humans and machines
can explore the Web of data. When such users access the Web, they leave a trail
as Web servers maintain a history of requests. Web usage mining approaches have
been studied since the beginning of the Web given the log's huge potential for
purposes such as resource annotation, personalization, forecasting etc.
However, the impact of any such efforts has not really gone beyond generating
statistics detailing who, when, and how Web pages maintained by a Web server
were visited.Comment: 1st International Workshop on Usage Analysis and the Web of Data
(USEWOD2011) in the 20th International World Wide Web Conference (WWW2011),
Hyderabad, India, March 28th, 201
Linked Data Entity Summarization
On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion
Workload Matters: A Robust Approach to Physical RDF Database Design
Recent advances in Information Extraction, Linked Data Management and the Semantic Web have led to a rapid increase in both the volume and the variety of publicly available graph-structured data. As more and more businesses start to capitalize on graph-structured data, data management systems are being exposed to workloads that are far more diverse and dynamic than what they were designed to handle. In particular, most systems rely on a workload-oblivious physical layout with a fixed-schema and are adaptive only if the changes in the schema are minor. Thus, they are unable to perform consistently well across different types of workloads.
This thesis introduces fundamental techniques for supporting diverse and dynamic workloads in RDF data management systems. Instead of assuming anything about the workload upfront, these techniques allow systems to adjust their physical designs as queries are executed. This includes changing the way (i) records are clustered in the storage system, (ii) data are organized and indexed, and (iii) queries are optimized, all at runtime. The thesis proceeds with a discussion of the challenges that have been encountered in implementing these ideas in a proof-of-concept prototype called chameleon-db, and it concludes with a thorough experimental evaluation