122 research outputs found
Large-Scale Storage and Reasoning for Semantic Data Using Swarms
Scalable, adaptive and robust approaches to store and analyze the massive amounts of data expected from Semantic Web applications are needed to bring the Web of Data to its full potential. The solution at hand is to distribute both data and requests onto multiple computers. Apart from storage, the annotation of data with machine-processable semantics is essential for realizing the vision of the Semantic Web. Reasoning on webscale data faces the same requirements as storage. Swarm-based approaches have been shown to produce near-optimal solutions for hard problems in a completely decentralized way. We propose a novel concept for reasoning within a fully distributed and self-organized storage system that is based on the collective behavior of swarm individuals and does not require any schema replication. We show the general feasibility and efficiency of our approach with a proof-of-concept experiment of storage and reasoning performance. Thereby, we positively answer the research question of whether swarm-based approaches are useful in creating a large-scale distributed storage and reasoning system. © 2012 IEEE
A survey of large-scale reasoning on the Web of data
As more and more data is being generated by sensor networks, social media and organizations, the Webinterlinking this wealth of information becomes more complex. This is particularly true for the so-calledWeb of Data, in which data is semantically enriched and interlinked using ontologies. In this large anduncoordinated environment, reasoning can be used to check the consistency of the data and of asso-ciated ontologies, or to infer logical consequences which, in turn, can be used to obtain new insightsfrom the data. However, reasoning approaches need to be scalable in order to enable reasoning over theentire Web of Data. To address this problem, several high-performance reasoning systems, whichmainly implement distributed or parallel algorithms, have been proposed in the last few years. Thesesystems differ significantly; for instance in terms of reasoning expressivity, computational propertiessuch as completeness, or reasoning objectives. In order to provide afirst complete overview of thefield,this paper reports a systematic review of such scalable reasoning approaches over various ontologicallanguages, reporting details about the methods and over the conducted experiments. We highlight theshortcomings of these approaches and discuss some of the open problems related to performing scalablereasoning
Scalable discovery of networked data : Algorithms, Infrastructure, Applications
Harmelen, F.A.H. van [Promotor]Siebes, R.M. [Copromotor
On Distributed SPARQL Query Processing Using Triangles of RDF Triples
Knowledge Graphs are providing valuable functionalities, such as data integration and reasoning, to an increasing number of applications in all kinds of companies. These applications partly depend on the efficiency of a Knowledge Graph management system which is often based on the RDF data model and queried with SPARQL. In this context, query performance is preponderant and relies on an optimizer that usually makes an intensive usage of a large set of indexes. Generally, these indexes correspond to different re-orderings of the subject, predicate and object of a triple pattern. In this work, we present a novel approach that considers indexes formed by a frequently encountered basic graph pattern: triangle of triples. We propose dedicated data structures to store these triangles, provide distributed algorithms to discover and materialize them, including inferred triangles, and detail query optimization techniques, including a data partitioning approach for bias data. We provide an implementation that runs on top of Apache Spark and experiment on two real-world RDF data sets. This evaluation emphasizes the performance boost (up to 40x on query processing) that one can obtain by using our approach when facing triangles of triples
SDSF : social-networking trust based distributed data storage and co-operative information fusion.
As of 2014, about 2.5 quintillion bytes of data are created each day, and 90% of the data in the world was created in the last two years alone. The storage of this data can be on external hard drives, on unused space in peer-to-peer (P2P) networks or using the more currently popular approach of storing in the Cloud. When the users store their data in the Cloud, the entire data is exposed to the administrators of the services who can view and possibly misuse the data. With the growing popularity and usage of Cloud storage services like Google Drive, Dropbox etc., the concerns of privacy and security are increasing. Searching for content or documents, from this distributed stored data, given the rate of data generation, is a big challenge. Information fusion is used to extract information based on the query of the user, and combine the data and learn useful information. This problem is challenging if the data sources are distributed and heterogeneous in nature where the trustworthiness of the documents may be varied. This thesis proposes two innovative solutions to resolve both of these problems. Firstly, to remedy the situation of security and privacy of stored data, we propose an innovative Social-based Distributed Data Storage and Trust based co-operative Information Fusion Framework (SDSF). The main objective is to create a framework that assists in providing a secure storage system while not overloading a single system using a P2P like approach. This framework allows the users to share storage resources among friends and acquaintances without compromising the security or privacy and enjoying all the benefits that the Cloud storage offers. The system fragments the data and encodes it to securely store it on the unused storage capacity of the data owner\u27s friends\u27 resources. The system thus gives a centralized control to the user over the selection of peers to store the data. Secondly, to retrieve the stored distributed data, the proposed system performs the fusion also from distributed sources. The technique uses several algorithms to ensure the correctness of the query that is used to retrieve and combine the data to improve the information fusion accuracy and efficiency for combining the heterogeneous, distributed and massive data on the Cloud for time critical operations. We demonstrate that the retrieved documents are genuine when the trust scores are also used while retrieving the data sources. The thesis makes several research contributions. First, we implement Social Storage using erasure coding. Erasure coding fragments the data, encodes it, and through introduction of redundancy resolves issues resulting from devices failures. Second, we exploit the inherent concept of trust that is embedded in social networks to determine the nodes and build a secure net-work where the fragmented data should be stored since the social network consists of a network of friends, family and acquaintances. The trust between the friends, and availability of the devices allows the user to make an informed choice about where the information should be stored using `k\u27 optimal paths. Thirdly, for the purpose of retrieval of this distributed stored data, we propose information fusion on distributed data using a combination of Enhanced N-grams (to ensure correctness of the query), Semantic Machine Learning (to extract the documents based on the context and not just bag of words and also considering the trust score) and Map Reduce (NSM) Algorithms. Lastly we evaluate the performance of distributed storage of SDSF using era- sure coding and identify the social storage providers based on trust and evaluate their trustworthiness. We also evaluate the performance of our information fusion algorithms in distributed storage systems. Thus, the system using SDSF framework, implements the beneficial features of P2P networks and Cloud storage while avoiding the pitfalls of these systems. The multi-layered encrypting ensures that all other users, including the system administrators cannot decode the stored data. The application of NSM algorithm improves the effectiveness of fusion since large number of genuine documents are retrieved for fusion
Recommended from our members
Investigating elastic cloud based RDF processing
The Semantic Web was proposed as an extension of the traditional Web to give Web data context and meaning by using the Resource Description Framework (RDF) data model. The recent growth in the adoption of RDF in addition to the massive growth of RDF data, have led numerous efforts to focus on the challenges of processing this data. To this extent, many approaches have focused on vertical scalability by utilising powerful hardware, or horizontal scalability utilising always-on physical computer clusters or peer to peer networks. However, these approaches utilise fixed and high specification computer clusters that require considerable upfront and ongoing investments to deal with the data growth. In recent years cloud computing has seen wide adoption due to its unique elasticity and utility billing features.
This thesis addresses some of the issues related to the processing of large RDF datasets by utilising cloud computing. Initially, the thesis reviews the background literature of related distributed RDF processing work and issues, in particular distributed rulebased reasoning and dictionary encoding, followed by a review of the cloud computing paradigm and related literature. Then, in order to fully utilise features that are specific to cloud computing such as elasticity, the thesis designs and fully implements a Cloud-based Task Execution framework (CloudEx), a generic framework for efficiently
distributing and executing tasks on cloud environments. Subsequently, some of the large-scale RDF processing issues are addressed by using the CloudEx framework to develop algorithms for processing RDF using cloud computing. These algorithms perform efficient dictionary encoding and forward reasoning using cloud-based
columnar databases. The algorithms are collectively implemented as an Elastic Cost Aware Reasoning Framework (ECARF), a cloud-based RDF triple store. This thesis
presents original results and findings that advance the state of the art of performing distributed cloud-based RDF processing and forward reasoning
Efficient algorithms for agent-based semantic resource discovery
A semantic overlay network is a powerful mechanism for collaborative environments where multiple agents, managing several resources, can cooperate in pursuing common and individual goals while achieving good overall performance. However, building such a social structure dynamically from an unstructured peer-to-peer network is a lengthy process if appropriate algorithms and techniques are not used. In this paper, we analyse a set of network evolution techniques that improve the performance of classic approaches, such as the flooding search algorithm. We compare the efficiency of these enhanced classic algorithms with our previously proposed search algorithm, which has also been improved through the referred techniques. Evaluation tests show that the improved version of our algorithm outperforms the improved version of the classic search algorithm and efficiently creates a semantic overlay network for agent-based resource coordination.info:eu-repo/semantics/acceptedVersio
Κατανεμημένη αποτίμηση επερωτήσεων και συλλογιστική για το μοντέλο RDF σε δίκτυα ομοτίμων κόμβων
Με το ενδιαφέρον για τις εφαρμογές του Σημασιολογικού Ιστού να αυξάνεται
ραγδαία, το μοντέλο RDF και RDFS έχει γίνει ένα από τα πιο ευρέως
χρησιμοποιούμενα μοντέλα δεδομένων για την αναπαράσταση και την ενσωμάτωση
δομημένης πληροφορίας στον Ιστό. Το πλήθος των διαθέσιμων πηγών πληροφορίας RDF
συνεχώς αυξάνεται με αποτέλεσμα να υπάρχει μια επιτακτική ανάγκη για τη
διαχείριση RDF δεδομένων. Σε αυτή τη διατριβή επικεντρωνόμαστε στην
κατανεμημένη διαχείριση RDF δεδομένων σε δίκτυα ομότιμων κόμβων. Σχεδιάζουμε
και υλοποιούμε το σύστημα Atlas, ένα πλήρως κατανεμημένο σύστημα για την
αποθήκευση RDF και RDFS δεδομένων, την αποτίμηση και βελτιστοποίηση επερωτήσεων
στη γλώσσα SPARQL και τη συλλογιστική στο μοντέλο RDFS. Το σύστημα Atlas
χρησιμοποιεί κατανεμημένους πίνακες κατακερματισμού, μια δημοφιλή περίπτωση
δικτύων ομότιμων κόμβων. Αρχικά, αναλύουμε κατανεμημένους αλγόριθμους για
συλλογιστική RDFS χρησιμοποιώντας κατανεμημένους πίνακες κατακερματισμού.
Υλοποιηούμε διάφορες παραλλαγές των αλγορίθμων προς τα εμπρός αλυσίδα εκτέλεσης
και προς τα πίσω αλυσίδα εκτέλεσης καθώς και έναν αλγόριθμο που χρησιμοποιεί
την τεχνική μετασχηματισμού των κανόνων σε μαγικό σύνολο. Αποδεικνύουμε
θεωρητικά την ορθότητα των αλγορίθμων αυτών και προσφέρουμε μια συγκριτική
μελέτη τόσο αναλυτικά όσο και πειραματικά. Παράλληλα, προτείνουμε αλγορίθμους
και τεχνικές για την αποτίμηση και τη βελτιστοποίηση επερωτήσεων στη γλώσσα
SPARQL για RDF δεδομένα που είναι αποθηκευμένα σε κατανεμημένους πίνακες
κατακερματισμού. Οι τεχνικές βελτιστοποίησης βασίζονται σε εκτιμήσεις
επιλεκτικότητας και έχουν στόχο τη μείωση του χρόνου απόκρισης της επερώτησης
καθώς και της κατανάλωσης εύρους ζώνης του δικτύου. Η εκτεταμένη πειραματική
αξιολόγηση των μεθόδων βελτιστοποίησης γίνεται σε μια τοπική συστάδα
υπολογιστών χρησιμοποιώντας ένα ευρέως διαδεδομένο σημείο αναφοράς μετρήσεων.With the interest in Semantic Web applications rising rapidly, the Resource
Description Framework (RDF) and its accompanying vocabulary description
language, RDF Schema (RDFS), have become one of the most widely used data
models for representing and integrating structured information in the Web. With
the vast amount of available RDF data sources on the Web increasing rapidly,
there is an urgent need for RDF data management. In this thesis, we focus on
distributed RDF data management in peer-to-peer (P2P) networks. More
specifically, we present results that advance the state-of-the-art in the
research area of distributed RDF query processing and reasoning in P2P
networks. We fully design and implement a P2P system, called Atlas, for the
distributed query processing and reasoning of RDF and RDFS data. Atlas is built
on top of distributed hash tables (DHTs), a commonly-used case of P2P networks.
Initially, we study RDFS reasoning algorithms on top of DHTs. We design and
develop distributed forward and backward chaining algorithms, as well as an
algorithm which works in a bottom-up fashion using the magic sets
transformation technique. We study theoretically the correctness of our
reasoning algorithms and prove that they are sound and complete. We also
provide a comparative study of our algorithms both analytically and
experimentally. In the experimental part of our study, we obtain measurements
in the realistic large-scale distributed environment of PlanetLab as well as in
the more controlled environment of a local cluster. Moreover, we propose
algorithms for SPARQL query processing and optimization over RDF(S) databases
stored on top of distributed hash tables. We fully implement and evaluate a
DHT-based optimizer. The goal of the optimizer is to minimize the time for
answering a query as well as the bandwidth consumed during the query
evaluation. The optimization algorithms use selectivity estimates to determine
the chosen query plan. Our algorithms and techniques have been extensively
evaluated in a local cluster
- …