321 research outputs found

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    An Interoperable Access Control System based on Self-Sovereign Identities

    Get PDF
    The extreme growth of the World Wide Web in the last decade together with recent scandals related to theft or abusive use of personal information have left users unsatisfied withtheir digital identity providers and concerned about their online privacy. Self-SovereignIdentity (SSI) is a new identity management paradigm which gives back control over personal information to its rightful owner - the individual. However, adoption of SSI on theWeb is complicated by the high overhead costs for the service providers due to the lackinginteroperability of the various emerging SSI solutions. In this work, we propose an AccessControl System based on Self-Sovereign Identities with a semantically modelled AccessControl Logic. Our system relies on the Web Access Control authorization rules usedin the Solid project and extends them to additionally express requirements on VerifiableCredentials, i.e., digital credentials adhering to a standardized data model. Moreover,the system achieves interoperability across multiple DID Methods and types of VerifiableCredentials allowing for incremental extensibility of the supported SSI technologies bydesign. A Proof-of-Concept prototype is implemented and its performance as well as multiple system design choices are evaluated: The End-to-End latency of the authorizationprocess takes between 2-5 seconds depending on the used DID Methods and can theoretically be further optimized to 1.5-3 seconds. Evaluating the potential interoperabilityachieved by the system shows that multiple DID Methods and different types of VerifiableCredentials can be supported. Lastly, multiple approaches for modelling required Verifiable Credentials are compared and the suitability of the SHACL language for describingthe RDF graphs represented by the required Linked Data credentials is shown

    Large-Scale Storage and Reasoning for Semantic Data Using Swarms

    Get PDF
    Scalable, adaptive and robust approaches to store and analyze the massive amounts of data expected from Semantic Web applications are needed to bring the Web of Data to its full potential. The solution at hand is to distribute both data and requests onto multiple computers. Apart from storage, the annotation of data with machine-processable semantics is essential for realizing the vision of the Semantic Web. Reasoning on webscale data faces the same requirements as storage. Swarm-based approaches have been shown to produce near-optimal solutions for hard problems in a completely decentralized way. We propose a novel concept for reasoning within a fully distributed and self-organized storage system that is based on the collective behavior of swarm individuals and does not require any schema replication. We show the general feasibility and efficiency of our approach with a proof-of-concept experiment of storage and reasoning performance. Thereby, we positively answer the research question of whether swarm-based approaches are useful in creating a large-scale distributed storage and reasoning system. © 2012 IEEE

    Migrating microservices to graph database

    Get PDF
    Microservice architecture is a popular approach to structuring web backend services. Another emerging trend, after a period of hibernation, is utilizing modern graph database management systems for managing complex, richly connected data. The two approaches have rarely been used in tandem, as microservices emphasize modularization and decoupling of services, while graph data models favor data integration. In this study, literature on microservices and graph databases is reviewed and a synthesis between the two paradigms is presented. Based on the theoretical discussion, a software architecture combining the two elements is formulated and implemented using microservices serving content metadata at Yleisradio, the Finnish national broadcasting company. The architecture design follows the Design Science Research Process model. Finally, the renewed system is evaluated using quantitative and qualitative metrics. The performance of the system is measured using automated API queries and load tests. The new system was compared to an earlier version based on a PostgreSQL database. The tests gave slight indication that the renewed system performed better for complex queries, where a large number of relations were traversed, but worse in terms of throughput under heavy load. Based on the these findings, a number of performance-enhancing optimizations to the system are introduced. Observations and perpectives are also gathered in a project retrospective session. It is concluded that the resulting architecture holds promise for managing complex data rich in relations in a safe manner. In it, the different domains of the knowledge graph are decoupled into distinct named graphs managed by different microservices

    Continuous Queries and Real-time Analysis of Social Semantic Data with C-SPARQL

    Get PDF
    Abstract. Social semantic data are becoming a reality, but apparently their streaming nature has been ignored so far. Streams, being unbounded sequences of time-varying data elements, should not be treated as persistent data to be stored “forever ” and queried on demand, but rather as transient data to be consumed on the fly by queries which are registered once and for all and keep analyzing such streams, producing answers triggered by the streaming data and not by explicit invocation. In this paper, we propose an approach to continuous queries and realtime analysis of social semantic data with C-SPARQL, an extension of SPARQL for querying RDF streams

    Decentralized Control and Adaptation in Distributed Applications via Web and Semantic Web Technologies

    Get PDF
    The presented work provides an approach and an implementation for enabling decentralized control in distributed applications composed of heterogeneous components by benefiting from the interoperability provided by the Web stack and relying on semantic technologies for enabling data integration. In particular, the concept of Smart Components enables adaptability at runtime through an adaptation layer and is complemented by a reference architecture as well as a prototypical implementation

    Decentralized linked open data in constrained wireless sensor networks

    Get PDF
    Data generated by sensors in Internet of Things ecosystems contains lots of valuable information, which is often not used to its full potential. This is mainly due to the fact that data is stored in proprietary storages and formats. Manufacturers of sensor devices often offer closed platforms to view and manage the data, which limits their reusability. Moreover, questions start to raise about true data ownership over data generated from monitoring our everyday lives. In order to overcome these issues several initiatives have emerged in the past to hand over data to the rightful owner. One of these initiatives is Solid, currently focusing on socially linked data. However, never before did one apply the Solid principles to Internet of Things data. Therefore, in this paper, a novel approach is presented where sensor data is handled from sensor to storage using open data formats and standards to ensure interoperability and reusability. It is shown that combining existing concepts can be helpful in designing decentralized Internet of Things data storages, on top of which data can be incorporated into the Linked Open Data cloud. This has been done by comparing the overhead of a regular approach, using Linked Open Data concepts on top of a sensor device, to an approach that was optimized for device management in constrained Internet of Things networks
    corecore