44 research outputs found

    Universal Workload-based Graph Partitioning and Storage Adaption for Distributed RDF Stores

    Get PDF
    The publication of machine-readable information has been significantly increasing both in the magnitude and complexity of the embedded relations. The Resource Description Framework(RDF) plays a big role in modeling and linking web data and their relations. In line with that important role, dedicated systems were designed to store and query the RDF data using a special queering language called SPARQL similar to the classic SQL. However, due to the high size of the data, several federated working nodes were used to host a distributed RDF store. The data needs to be partitioned, assigned, and stored in each working node. After partitioning, some of the data needs to be replicated in order to avoid the communication cost, and balance the loads for better system throughput. Since replications require more storage space, the important two questions are: what data to replicate? And how much? The answer to the second question is related to other storage-space requirements at each working node like indexes and cache. In order to efficiently answer SPARQL queries, each working node needs to put its share of data into multiple indexes. Those indexes have a data-wide size and consume a considerable amount of storage space. In this context, the same two questions about replications are also raised about indexes. The third storage-consuming structure is the join cache. It is a special index where the frequent join results are cached and save a considerable amount of running time on the cost of high storage space consumption. Again, the same two questions of replication and indexes are applicable to the join-cache. In this thesis, we present a universal adaption approach to the storage of a distributed RDF store. The system aims to find optimal data assignments to the different indexes, replications, and join cache within the limited storage space. To achieve this, we present a cost model based on the workload that often contains frequent patterns. The workload is dynamically analyzed to evaluate predefined rules. Those rules tell the system about the benefits and costs of assigning which data to what structure. The objective is to have better query execution time. Besides the storage adaption, the system adapts its processing resources with the queries' arrival rate. The aim of this adaption is to have better parallelization per query while still provides high system throughput

    The Semantic Shadow : Combining User Interaction with Context Information for Semantic Web-Site Annotation

    Get PDF
    This thesis develops the concept of the Semantic Shadow (SemS), a model for managing contentual and structural annotations on web page elements and their values. The model supports a contextual weighting of the annotated information, allowing to specify the annotation values in relation to the evaluation context. A procedure is presented, which allows to manage and process this context-dependent meta information on web page elements using a dedicated programming interface. Two distinct implementations for the model have been developed: One based on Java objects, the other using the Resource Description Framework (RDF) as modeling backend. This RDF-based storage allows to integrate the annotations of the Semantic Shadow with other information of the Semantic Web. To demonstrate the application of the Semantic Shadow concept, a procedure to optimize web based user interfaces based on the structural semantics has been developed: Assuming a mobile client, a requested web page is dynamically adapted by a proxy prototype, where the context-awareness of the adaptation can be directly modeled alongside with the structural annotations. To overcome the drawback of missing annotations for existing web pages, this thesis introduces a concept to derive context-dependent meta-information on the web pages from their usage: From the observation of the users' interaction with a web page, certain context-dependent structural information about the concerned web page elements can be derived and stored in the annotation model of the Semantic Shadow concept.In dieser Arbeit wird das Konzept des Semantic Shadow (dt. Semantischer Schatten) entwickelt, ein Programmier-Modell um Webseiten-Elemente mit inhaltsbezogenen und strukturellen Anmerkungen zu versehen. Das Modell unterstützt dabei eine kontextabhängige Gewichtung der Anmerkungen, so dass eine Anmerkung in Bezug zum Auswertungs-Kontext gesetzt werden kann. Zur Verwaltung und Verarbeitung dieser kontextbezogenen Meta-Informationen für Webseiten-Elemente wurde im Rahmen der Arbeit eine Programmierschnittstelle definiert. Dazu wurden zwei Implementierungen der Schnittstelle entwickelt: Eine basiert ausschließlich auf Java-Objekten, die andere baut auf einem RDF-Modell auf. Die RDF-basierte Persistierung erlaubt eine Integration der Semantic-Shadow-Anmerkungen mit anderen Anwendungen des Semantic Webs. Um die Anwendungsmöglichkeiten des Semantic-Shadow-Konzepts darzustellen, wurde eine Vorgehensweise zur Optimierung von webbasierten Benutzerschnittstellen auf Grundlage von semantischen Strukturinformationen entwickelt: Wenn ein mobiler Benutzer eine Webseite anfordert, wird diese dynamisch durch einen Proxy angepasst. Die Kontextabhängigkeit dieser Anpassung wird dabei bereits direkt mit den Struktur-Anmerkungen modelliert. Für bestehende Webseiten liegen zumeist keine Annotationen vor. Daher wird in dieser Arbeit ein Konzept vorgestellt, kontextabhängige Meta-Informationen aus der Benutzung der Webseiten zu bestimmen: Durch Beobachtung der Benutzer-Interaktionen mit den Webseiten-Elementen ist es möglich bestimmte kontextabhängige Strukturinformationen abzuleiten und als Anmerkungen im Modell des Semantic-Shadow-Konzepts zu persistieren

    Bench-Ranking: ettekirjutav analüüsimeetod suurte teadmiste graafide päringutele

    Get PDF
    Relatsiooniliste suurandmete (BD) töötlemisraamistike kasutamine suurte teadmiste graafide töötlemiseks kätkeb endas võimalust päringu jõudlust optimeerimida. Kaasaegsed BD-süsteemid on samas keerulised andmesüsteemid, mille konfiguratsioonid omavad olulist mõju jõudlusele. Erinevate raamistike ja konfiguratsioonide võrdlusuuringud pakuvad kogukonnale parimaid tavasid parema jõudluse saavutamiseks. Enamik neist võrdlusuuringutest saab liigitada siiski vaid kirjeldavaks ja diagnostiliseks analüütikaks. Lisaks puudub ühtne standard nende uuringute võrdlemiseks kvantitatiivselt järjestatud kujul. Veelgi enam, suurte graafide töötlemiseks vajalike konveierite kavandamine eeldab täiendavaid disainiotsuseid mis tulenevad mitteloomulikust (relatsioonilisest) graafi töötlemise paradigmast. Taolisi disainiotsuseid ei saa automaatselt langetada, nt relatsiooniskeemi, partitsioonitehnika ja salvestusvormingute valikut. Käesolevas töös käsitleme kuidas me antud uurimuslünga täidame. Esmalt näitame disainiotsuste kompromisside mõju BD-süsteemide jõudluse korratavusele suurte teadmiste graafide päringute tegemisel. Lisaks näitame BD-raamistike jõudluse kirjeldavate ja diagnostiliste analüüside piiranguid suurte graafide päringute tegemisel. Seejärel uurime, kuidas lubada ettekirjutavat analüütikat järjestamisfunktsioonide ja mitmemõõtmeliste optimeerimistehnikate (nn "Bench-Ranking") kaudu. See lähenemine peidab kirjeldava tulemusanalüüsi keerukuse, suunates praktiku otse teostatavate teadlike otsusteni.Leveraging relational Big Data (BD) processing frameworks to process large knowledge graphs yields a great interest in optimizing query performance. Modern BD systems are yet complicated data systems, where the configurations notably affect the performance. Benchmarking different frameworks and configurations provides the community with best practices for better performance. However, most of these benchmarking efforts are classified as descriptive and diagnostic analytics. Moreover, there is no standard for comparing these benchmarks based on quantitative ranking techniques. Moreover, designing mature pipelines for processing big graphs entails considering additional design decisions that emerge with the non-native (relational) graph processing paradigm. Those design decisions cannot be decided automatically, e.g., the choice of the relational schema, partitioning technique, and storage formats. Thus, in this thesis, we discuss how our work fills this timely research gap. Particularly, we first show the impact of those design decisions’ trade-offs on the BD systems’ performance replicability when querying large knowledge graphs. Moreover, we showed the limitations of the descriptive and diagnostic analyses of BD frameworks’ performance for querying large graphs. Thus, we investigate how to enable prescriptive analytics via ranking functions and Multi-Dimensional optimization techniques (called ”Bench-Ranking”). This approach abstracts out from the complexity of descriptive performance analysis, guiding the practitioner directly to actionable informed decisions.https://www.ester.ee/record=b553332

    Affective graphs: the visual appeal of linked data

    Get PDF
    The essence and value of Linked Data lies in the ability of humans and machines to query, access and reason upon highly structured and formalised data. Ontology structures provide an unambiguous description of the structure and content of data. While a multitude of software applications and visualization systems have been developed over the past years for Linked Data, there is still a significant gap that exists between applications that consume Linked Data and interfaces that have been designed with significant focus on aesthetics. Though the importance of aesthetics in affecting the usability, effectiveness and acceptability of user interfaces have long been recognised, little or no explicit attention has been paid to the aesthetics of Linked Data applications. In this paper, we introduce a formalised approach to developing aesthetically pleasing semantic web interfaces by following aesthetic principles and guidelines identified from literature. We apply such principles to design and develop a generic approach of using visualizations to support exploration of Linked Data, in an interface that is pleasing to users. This provides users with means to browse ontology structures, enriched with statistics of the underlying data, facilitating exploratory activities and enabling visual query for highly precise information needs. We evaluated our approach in three ways: an initial objective evaluation comparing our approach with other well-known interfaces for the semantic web and two user evaluations with semantic web researchers

    Interactive graphics, graphical user interfaces and software interfaces for the analysis of biological experimental data and networks

    Get PDF
    Biologists need to analyze and comprehend increasingly large and more complex multivariate experimental data. Biological experiments often produce multiple data sets, each describing one aspect of the system, such as the transcriptome recorded by a microarray or metabolome recorded using gas chromatography mass spectrometry (GC-MS). A biochemical network model provides a conceptual system-level framework for integrating data from different sources.;Effective use of graphics enhances the comprehension of data, and interactive graphics permit the analyst to actively explore data, check its integrity, satiate curiosities and reveal the unexpected. Interactive graphics have not been widely applied as a means for understanding data from biological experiments.;This thesis addresses these needs by providing new methods and software that apply interactive graphics in coordination with numerical methods to the analysis of biological data, in a manner that is accessible to biologists

    Designing an interface to provide new functionality for the post-processing of web-based annotations.

    Get PDF
    Systems to annotate online content are becoming increasingly common on the World Wide Web. While much research and development has been done for interfaces that allow users to make and view annotations, few annotation systems provide functionality that extends beyond this and allows users to also manage and process collections of existing annotations. Siyavula Education is a social enterprise that publishes high school Maths and Science textbooks online. The company uses annotations to collate collaborator and volunteer feedback (corrections, opinions, suggestions) about its books at various phases in the book-writing life cycle. Currently the company captures annotations on PDF versions of their books. The web-based software they use allows for some filtering and sorting of existing annotations, but the system is limited and not ideal for their rather specialised requirements. In an attempt to move away from a proprietary, PDF-based system Siyavula implemented annotator (http://okfnlabs.org/annotator/), software which allowed for the annotation of HTML pages. However, this software was not coupled with a back-end interface that would allow users to interact with a database of saved annotations. To enable this kind of interaction, a prototype interface was designed and is presented here. The purpose of the interface was to give users new and improved functionality for querying and manipulating a collection of web-based annotations about Siyavula’s online content. Usability tests demonstrated that the interface was successful at giving users this new and necessary functionality (including filtering, sorting and searching) to process annotations. Once integrated with front-end software (such as Annotator) and issue tracking software (such as GitHub) the interface could form part of a powerful new tool for the making and management of annotations on the Web

    CRC806-Database: A semantic e-Science infrastructure for an interdisciplinary research centre

    Get PDF
    Well designed information infrastructure improves the conduct of research, and can connect researchers and projects across disciplines to facilitate collaboration. The topic of this thesis is the design and development of an information infrastructure for a large interdisciplinary research project, the DFG-funded Collaborative Research Centre 806 (CRC 806). Under the name CRC806-Database the presented infrastructure was developed in the frame of the subproject "Z2: Data Management and Data Services", a so-called INF project, which is responsible for the research data management within a DFG funded CRC. During the design, development and implementation of the CRC806-Database, the complex requirements for sound data management in the context of a large interdisciplinary research project were considered theoretically, as well as practically during the implementation. The presented infrastructure design is mainly based on the requirements for research data management in CRC's, that is mainly the secure storage of primary research data for at least ten years, as well as on the further recommendations, that are about support and improvement of research and facilitation of Web-based collaboration, for information infrastructure by the DFG. The CRC806-Database semantic e-Science infrastructure consists of three main components, i.) the CRC806-RDM component that implements the research data management, including a data catalog and a publication database, ii.) the CRC806-SDI component that provides a Spatial Data Infrastructure (SDI) for Web-based management of spatial data, and additionally, iii.) the CRC806-KB component that implements a collaborative virtual research environment and knowledgebase. From a technical perspective, the infrastructure is based on the application of existing Open Source Software (OSS) solutions, that were customized to adapt to the specific requirements were necessary. The main OSS products that were applied for the development of the CRC806-Database are; Typo3, CKAN, GeoNode and Semantic MediaWiki. As integrative technical and theoretical basis of the infrastructure, the concept of Semantic e-Science was implemented. The term e-Science refers to a scientific paradigm that describes computationally intensive science carried out in networked environments. The prefix "Semantic" extends this concept with the application of Semantic Web technologies. A further applied conceptual basis for the development of CRC806-Database, is known under the name "Open Science", that includes the concepts of "Open Access", "Open Data" and "Open Methodology". These concepts have been implemented for the CRC806-Database semantic e-Science infrastructure, as described in the course of this thesis

    Information scraps : understanding and design

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 139-148).In this thesis I investigate information scraps - personal information whose content has been scribbled on Post-it notes, scrawled on the corners of sheets of paper, stuck in our pockets, sent in e-mail messages to ourselves, and stashed into miscellaneous digital text files. Information scraps encode information ranging from ideas and sketches to notes, reminders, shipment tracking numbers, driving directions, and even poetry. I proceed by performing an in-depth ethnographic investigation of the nature and use of information scraps, and by designing and building two research systems designed for information scrap management. The first system, Jourknow, lowers the capture barrier for unstructured notes and structured information such as calendar items and to-dos, captures contextual information surrounding note creation such as location, documents viewed, and people corresponded with, and manages uncommon user-generated personal information such as restaurant reviews or this week's shopping list. The follow-up system, Pinky, further explores the lightweight capture space by providing a command line interface that is tolerant to re-ordering and GUI affordances for quick and accurate entry. Reflecting on these tools' successes and failures, I characterize the design process challenges inherent in designing and building information scrap tools.by Michael Bernstein.S.M

    dspace 6.0 manual

    Get PDF
    corecore