322 research outputs found

    Building a P2P RDF Store for Edge Devices

    Full text link
    The Semantic Web technologies have been used in the Internet of Things (IoT) to facilitate data interoperability and address data heterogeneity issues. The Resource Description Framework (RDF) model is employed in the integration of IoT data, with RDF engines serving as gateways for semantic integration. However, storing and querying RDF data obtained from distributed sources across a dynamic network of edge devices presents a challenging task. The distributed nature of the edge shares similarities with Peer-to-Peer (P2P) systems. These similarities include attributes like node heterogeneity, limited availability, and resources. The nodes primarily undertake tasks related to data storage and processing. Therefore, the P2P models appear to present an attractive approach for constructing distributed RDF stores. Based on P-Grid, a data indexing mechanism for load balancing and range query processing in P2P systems, this paper proposes a design for storing and sharing RDF data on P2P networks of low-cost edge devices. Our design aims to integrate both P-Grid and an edge-based RDF storage solution, RDF4Led for building an P2P RDF engine. This integration can maintain RDF data access and query processing while scaling with increasing data and network size. We demonstrated the scaling behavior of our implementation on a P2P network, involving up to 16 nodes of Raspberry Pi 4 devices.Comment: Accepted to IoT Conference 202

    Encrypted Distributed Dictionaries

    Get PDF
    End-to-end encrypted databases have been heavily studied in the last two decades. A crucial aspect that previous work has neglected, however, is that real-world databases are distributed in the sense that data is partitioned among a cluster of nodes---as opposed to being stored on a single node. In this work, we initiate the study of encrypted distributed data structures which are end-to-end encrypted variants of distributed data structures; themselves fundamental to the design of distributed databases. In particular, we design and analyze encrypted variants of distributed dictionaries (EDDX), which are an important building block in distributed system design and have applications ranging from content delivery networks to off-chain storage networks for blockchains and smart contracts. We formalize the notion of an encrypted DDX and provide simulation-based security definitions that capture the security properties one would desire from such an object. We propose an EDDX construction that uses a distributed hash table (DHT) as a black box. Interestingly, we show that our construction leaks information probabilistically, where the probability is a function of how well the underlying DHT load balances its data. We also show that in order to be securely used with our construction, a plaintext DHT needs to satisfy a form of programmability , a property that usually only emerges in the context of cryptographic primitives. To show that these properties are indeed achievable in practice, we study the balancing properties of the Chord DHT---arguably one of the most influential DHT---and show that it is also programmable. Finally, we consider the problem of encrypted DDXs in the context of transient networks, where nodes can be arbitrarily added or removed from the network

    Survey of Models and Architectures to Ensure Linked Data Access

    Get PDF
    Mobile Access to the Web of Data is currently a real challenge in developing countries, mainly characterized by limited Internet connectivity and high penetration of mobile devices with the limited resources (such as cache and memory). In this paper, we survey and compare proposed solutions (such as models and architectures) that could contribute to solving this problem of mobile access to the Web of Data with intermittent Internet access. These solutions are discussed in relation to the underlying network architectures and data models considered. We present a conceptual study of peer-to-peer solutions based on gossip protocols dedicated to design the connected overlay networks. In addition, we provide a detailed analysis of client-server and data replication systems generally designed to ensure the local availability of data on the system. We conclude with some recommendations to achieve a connected architecture that provides mobile contributors with local access to the Web of data

    Distributed Spatial Data Sharing: a new era in sharing spatial data

    Get PDF
    The advancements in information and communications technology, including the widespread adoption of GPS-based sensors, improvements in computational data processing, and satellite imagery, have resulted in new data sources, stakeholders, and methods of producing, using, and sharing spatial data. Daily, vast amounts of data are produced by individuals interacting with digital content and through automated and semi-automated sensors deployed across the environment. A growing portion of this information contains geographic information directly or indirectly embedded within it. The widespread use of automated smart sensors and an increased variety of georeferenced media resulted in new individual data collectors. This raises a new set of social concerns around individual geopricacy and data ownership. These changes require new approaches to managing, sharing, and processing geographic data. With the appearance of distributed data-sharing technologies, some of these challenges may be addressed. This can be achieved by moving from centralized control and ownership of the data to a more distributed system. In such a system, the individuals are responsible for gathering and controlling access and storing data. Stepping into the new area of distributed spatial data sharing needs preparations, including developing tools and algorithms to work with spatial data in this new environment efficiently. Peer-to-peer (P2P) networks have become very popular for storing and sharing information in a decentralized approach. However, these networks lack the methods to process spatio-temporal queries. During the first chapter of this research, we propose a new spatio-temporal multi-level tree structure, Distributed Spatio-Temporal Tree (DSTree), which aims to address this problem. DSTree is capable of performing a range of spatio-temporal queries. We also propose a framework that uses blockchain to share a DSTree on the distributed network, and each user can replicate, query, or update it. Next, we proposed a dynamic k-anonymity algorithm to address geoprivacy concerns in distributed platforms. Individual dynamic control of geoprivacy is one of the primary purposes of the proposed framework introduced in this research. Sharing data within and between organizations can be enhanced by greater trust and transparency offered by distributed or decentralized technologies. Rather than depending on a central authority to manage geographic data, a decentralized framework would provide a fine-grained and transparent sharing capability. Users can also control the precision of shared spatial data with others. They are not limited to third-party algorithms to decide their privacy level and are also not limited to the binary levels of location sharing. As mentioned earlier, individuals and communities can benefit from distributed spatial data sharing. During the last chapter of this work, we develop an image-sharing platform, aka harvester safety application, for the Kakisa indigenous community in northern Canada. During this project, we investigate the potential of using a Distributed Spatial Data sharing (DSDS) infrastructure for small-scale data-sharing needs in indigenous communities. We explored the potential use case and challenges and proposed a DSDS architecture to allow users in small communities to share and query their data using DSDS. Looking at the current availability of distributed tools, the sustainable development of such applications needs accessible technology. We need easy-to-use tools to use distributed technologies on community-scale SDS. In conclusion, distributed technology is in its early stages and requires easy-to-use tools/methods and algorithms to handle, share and query geographic information. Once developed, it will be possible to contrast DSDS against other data systems and thereby evaluate the practical benefit of such systems. A distributed data-sharing platform needs a standard framework to share data between different entities. Just like the first decades of the appearance of the web, these tools need regulations and standards. Such can benefit individuals and small communities in the current chaotic spatial data-sharing environment controlled by the central bodies

    Improving Data Availability in Decentralized Storage Systems

    Get PDF
    PhD thesis in Information technologyPreserving knowledge for future generations has been a primary concern for humanity since the dawn of civilization. State-of-the-art methods have included stone carvings, papyrus scrolls, and paper books. With each advance in technology, it has become easier to record knowledge. In the current digital age, humanity may preserve enormous amounts of knowledge on hard drives with the click of a button. The aggregation of several hard drives into a computer forms the basis for a storage system. Traditionally, large storage systems have comprised many distinct computers operated by a single administrative entity. With the rise in popularity of blockchain and cryptocurrencies, a new type of storage system has emerged. This new type of storage system is fully decentralized and comprises a network of untrusted peers cooperating to act as a single storage system. During upload, files are split into chunks and distributed across a network of peers. These storage systems encode files using Merkle trees, a hierarchical data structure that provides integrity verification and lookup services. While decentralized storage systems are popular and have a user base in the millions, many technical aspects are still in their infancy. As such, they have yet to prove themselves viable alternatives to traditional centralized storage systems. In this thesis, we contribute to the technical aspects of decentralized storage systems by proposing novel techniques and protocols. We make significant contributions with the design of three practical protocols that each improve data availability in different ways. Our first contribution is Snarl and entangled Merkle trees. Entangled Merkle trees are resilient data structures that decrease the impact hierarchical dependencies have on data availability. Whenever a chunk loss is detected, Snarl uses the entangled Merkle trees to find parity chunks to repair the lost chunk. Our results show that by encoding data as an entangled Merkle tree and using Snarl’s repair algorithm, the storage utilization in current systems could be improved by over five times, with improved data availability. Second, we propose SNIPS, a protocol that efficiently synchronizes the data stored on peers to ensure that all peers have the same data. We designed a Proof of Storage-like construction using a Minimal Perfect Hash Function. Each peer uses the PoS-like construction to create a storage proof for those chunks it wants to synchronize. Peers exchange storage proofs and use them to efficiently determine which chunks they are missing. The evaluation shows that by using SNIPS, the amount of synchronization data can be reduced by three orders of magnitude in current systems. Lastly, in our third contribution, we propose SUP, a protocol that uses cryptographic proofs to check if a chunk is already stored in the network before doing wasteful uploads. We show that SUP may reduce the amount of data transferred by up to 94 % in current systems. The protocols may be deployed independently or in combination to create a decentralized storage system that is more robust to major outages. Each of the protocols has been implemented and evaluated on a large cluster of 1,000 peers

    Data replication and update propagation in XML P2P data management systems

    Get PDF
    XML P2P data management systems are P2P systems that use XML as the underlying data format shared between peers in the network. These systems aim to bring the benefits of XML and P2P systems to the distributed data management field. However, P2P systems are known for their lack of central control and high degree of autonomy. Peers may leave the network at any time at will, increasing the risk of data loss. Despite this, most research in XML P2P systems focus on novel and efficient XML indexing and retrieval techniques. Mechanisms for ensuring data availability in XML P2P systems has received comparatively little attention. This project attempts to address this issue. We design an XML P2P data management framework to improve data availability. This framework includes mechanisms for wide-spread data replication, replica location and update propagation. It allows XML documents to be broken down into fragments. By doing so, we aim to reduce the cost of replicating data by distributing smaller XML fragments throughout the network rather than entire documents. To tackle the data replication problem, we propose a suite of selection and placement algorithms that may be interchanged to form a particular replication strategy. To support the placement of replicas anywhere in the network, we use a Fragment Location Catalogue, a global index that maintains the locations of replicas. We also propose a lazy update propagation algorithm to propagate updates to replicas. Experiments show that the data replication algorithms improve data availability in our experimental network environment. We also find that breaking XML documents into smaller pieces and replicating those instead of whole XML documents considerably reduces the replication cost, but at the price of some loss in data availability. For the update propagation tests, we find that the probability that queries return up-to-date results increases, but improvements to the algorithm are necessary to handle environments with high update rates

    Framework for resource efficient profiling of spatial model performance, A

    Get PDF
    2022 Summer.Includes bibliographical references.We design models to understand phenomena, make predictions, and/or inform decision-making. This study targets models that encapsulate spatially evolving phenomena. Given a model M, our objective is to identify how well the model predicts across all geospatial extents. A modeler may expect these validations to occur at varying spatial resolutions (e.g., states, counties, towns, census tracts). Assessing a model with all available ground-truth data is infeasible due to the data volumes involved. We propose a framework to assess the performance of models at scale over diverse spatial data collections. Our methodology ensures orchestration of validation workloads while reducing memory strain, alleviating contention, enabling concurrency, and ensuring high throughput. We introduce the notion of a validation budget that represents an upper-bound on the total number of observations that are used to assess the performance of models across spatial extents. The validation budget attempts to capture the distribution characteristics of observations and is informed by multiple sampling strategies. Our design allows us to decouple the validation from the underlying model-fitting libraries to interoperate with models designed using different libraries and analytical engines; our advanced research prototype currently supports Scikit-learn, PyTorch, and TensorFlow. We have conducted extensive benchmarks that demonstrate the suitability of our methodology
    • …
    corecore