2,181 research outputs found

    SDSF : social-networking trust based distributed data storage and co-operative information fusion.

    Get PDF
    As of 2014, about 2.5 quintillion bytes of data are created each day, and 90% of the data in the world was created in the last two years alone. The storage of this data can be on external hard drives, on unused space in peer-to-peer (P2P) networks or using the more currently popular approach of storing in the Cloud. When the users store their data in the Cloud, the entire data is exposed to the administrators of the services who can view and possibly misuse the data. With the growing popularity and usage of Cloud storage services like Google Drive, Dropbox etc., the concerns of privacy and security are increasing. Searching for content or documents, from this distributed stored data, given the rate of data generation, is a big challenge. Information fusion is used to extract information based on the query of the user, and combine the data and learn useful information. This problem is challenging if the data sources are distributed and heterogeneous in nature where the trustworthiness of the documents may be varied. This thesis proposes two innovative solutions to resolve both of these problems. Firstly, to remedy the situation of security and privacy of stored data, we propose an innovative Social-based Distributed Data Storage and Trust based co-operative Information Fusion Framework (SDSF). The main objective is to create a framework that assists in providing a secure storage system while not overloading a single system using a P2P like approach. This framework allows the users to share storage resources among friends and acquaintances without compromising the security or privacy and enjoying all the benefits that the Cloud storage offers. The system fragments the data and encodes it to securely store it on the unused storage capacity of the data owner\u27s friends\u27 resources. The system thus gives a centralized control to the user over the selection of peers to store the data. Secondly, to retrieve the stored distributed data, the proposed system performs the fusion also from distributed sources. The technique uses several algorithms to ensure the correctness of the query that is used to retrieve and combine the data to improve the information fusion accuracy and efficiency for combining the heterogeneous, distributed and massive data on the Cloud for time critical operations. We demonstrate that the retrieved documents are genuine when the trust scores are also used while retrieving the data sources. The thesis makes several research contributions. First, we implement Social Storage using erasure coding. Erasure coding fragments the data, encodes it, and through introduction of redundancy resolves issues resulting from devices failures. Second, we exploit the inherent concept of trust that is embedded in social networks to determine the nodes and build a secure net-work where the fragmented data should be stored since the social network consists of a network of friends, family and acquaintances. The trust between the friends, and availability of the devices allows the user to make an informed choice about where the information should be stored using `k\u27 optimal paths. Thirdly, for the purpose of retrieval of this distributed stored data, we propose information fusion on distributed data using a combination of Enhanced N-grams (to ensure correctness of the query), Semantic Machine Learning (to extract the documents based on the context and not just bag of words and also considering the trust score) and Map Reduce (NSM) Algorithms. Lastly we evaluate the performance of distributed storage of SDSF using era- sure coding and identify the social storage providers based on trust and evaluate their trustworthiness. We also evaluate the performance of our information fusion algorithms in distributed storage systems. Thus, the system using SDSF framework, implements the beneficial features of P2P networks and Cloud storage while avoiding the pitfalls of these systems. The multi-layered encrypting ensures that all other users, including the system administrators cannot decode the stored data. The application of NSM algorithm improves the effectiveness of fusion since large number of genuine documents are retrieved for fusion

    GPUs as Storage System Accelerators

    Full text link
    Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201

    Secure and Reliable Data Outsourcing in Cloud Computing

    Get PDF
    The many advantages of cloud computing are increasingly attracting individuals and organizations to outsource their data from local to remote cloud servers. In addition to cloud infrastructure and platform providers, such as Amazon, Google, and Microsoft, more and more cloud application providers are emerging which are dedicated to offering more accessible and user friendly data storage services to cloud customers. It is a clear trend that cloud data outsourcing is becoming a pervasive service. Along with the widespread enthusiasm on cloud computing, however, concerns on data security with cloud data storage are arising in terms of reliability and privacy which raise as the primary obstacles to the adoption of the cloud. To address these challenging issues, this dissertation explores the problem of secure and reliable data outsourcing in cloud computing. We focus on deploying the most fundamental data services, e.g., data management and data utilization, while considering reliability and privacy assurance. The first part of this dissertation discusses secure and reliable cloud data management to guarantee the data correctness and availability, given the difficulty that data are no longer locally possessed by data owners. We design a secure cloud storage service which addresses the reliability issue with near-optimal overall performance. By allowing a third party to perform the public integrity verification, data owners are significantly released from the onerous work of periodically checking data integrity. To completely free the data owner from the burden of being online after data outsourcing, we propose an exact repair solution so that no metadata needs to be generated on the fly for the repaired data. The second part presents our privacy-preserving data utilization solutions supporting two categories of semantics - keyword search and graph query. For protecting data privacy, sensitive data has to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. We define and solve the challenging problem of privacy-preserving multi- keyword ranked search over encrypted data in cloud computing. We establish a set of strict privacy requirements for such a secure cloud data utilization system to become a reality. We first propose a basic idea for keyword search based on secure inner product computation, and then give two improved schemes to achieve various stringent privacy requirements in two different threat models. We also investigate some further enhancements of our ranked search mechanism, including supporting more search semantics, i.e., TF × IDF, and dynamic data operations. As a general data structure to describe the relation between entities, the graph has been increasingly used to model complicated structures and schemaless data, such as the personal social network, the relational database, XML documents and chemical compounds. In the case that these data contains sensitive information and need to be encrypted before outsourcing to the cloud, it is a very challenging task to effectively utilize such graph-structured data after encryption. We define and solve the problem of privacy-preserving query over encrypted graph-structured data in cloud computing. By utilizing the principle of filtering-and-verification, we pre-build a feature-based index to provide feature-related information about each encrypted data graph, and then choose the efficient inner product as the pruning tool to carry out the filtering procedure

    A systematic review on cloud storage mechanisms concerning e-healthcare systems

    Get PDF
    As the expenses of medical care administrations rise and medical services experts are becoming rare, it is up to medical services organizations and institutes to consider the implementation of medical Health Information Technology (HIT) innovation frameworks. HIT permits health associations to smooth out their considerable cycles and offer types of assistance in a more productive and financially savvy way. With the rise of Cloud Storage Computing (CSC), an enormous number of associations and undertakings have moved their healthcare data sources to distributed storage. As the information can be mentioned whenever universally, the accessibility of information becomes an urgent need. Nonetheless, outages in cloud storage essentially influence the accessibility level. Like the other basic variables of cloud storage (e.g., reliability quality, performance, security, and protection), availability also directly impacts the data in cloud storage for e-Healthcare systems. In this paper, we systematically review cloud storage mechanisms concerning the healthcare environment. Additionally, in this paper, the state-of-the-art cloud storage mechanisms are critically reviewed for e-Healthcare systems based on their characteristics. In short, this paper summarizes existing literature based on cloud storage and its impact on healthcare, and it likewise helps researchers, medical specialists, and organizations with a solid foundation for future studies in the healthcare environment.Qatar University [IRCC-2020-009]

    Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining

    Full text link
    Extensive system scales (i.e. thousands of GPU/TPUs) and prolonged training periods (i.e. months of pretraining) significantly escalate the probability of failures when training large language models (LLMs). Thus, efficient and reliable fault-tolerance methods are in urgent need. Checkpointing is the primary fault-tolerance method to periodically save parameter snapshots from GPU memory to disks via CPU memory. In this paper, we identify the frequency of existing checkpoint-based fault-tolerance being significantly limited by the storage I/O overheads, which results in hefty re-training costs on restarting from the nearest checkpoint. In response to this gap, we introduce an in-memory fault-tolerance framework for large-scale LLM pretraining. The framework boosts the efficiency and reliability of fault tolerance from three aspects: (1) Reduced Data Transfer and I/O: By asynchronously caching parameters, i.e., sharded model parameters, optimizer states, and RNG states, to CPU volatile memory, Our framework significantly reduces communication costs and bypasses checkpoint I/O. (2) Enhanced System Reliability: Our framework enhances parameter protection with a two-layer hierarchy: snapshot management processes (SMPs) safeguard against software failures, together with Erasure Coding (EC) protecting against node failures. This double-layered protection greatly improves the survival probability of the parameters compared to existing checkpointing methods. (3) Improved Snapshotting Frequency: Our framework achieves more frequent snapshotting compared with asynchronous checkpointing optimizations under the same saving time budget, which improves the fault tolerance efficiency. Empirical results demonstrate that Our framework minimizes the overhead of fault tolerance of LLM pretraining by effectively leveraging redundant CPU resources.Comment: Fault Tolerance, Checkpoint Optimization, Large Language Model, 3D parallelis

    Geographical forwarding algorithm based video content delivery scheme for internet of vehicles (IoV)

    Get PDF
    This is an accepted manuscript of an article published by IEEE Multimedia Communications Technical Committee in MMTC Communications – Frontiers on 31/07/2020, available online: https://mmc.committees.comsoc.org/files/2020/07/MMTC_Communication_Frontier_July_2020.pdf The accepted version of the publication may differ from the final published version.An evolved form of Vehicular Ad hoc Networks (VANET) has recently emerged as the Internet of Vehicles (IoV). Though, there are still some challenges that need to be addressed in support IoV applications. The objective of this research is to achieve an efficient video content transmission over vehicular networks. We propose a balanced video-forwarding algorithm for delivering video-based content delivery scheme. The available neighboring vehicles will be ranked to the vehicle in forwarding progress before transmitting the video frames using proposed multi-score function. Considering the current beacon reception rate, forwarding progress and direction to destination, in addition to residual buffer length; the proposed algorithm can elect the best candidate to forward the video frames to the next highest ranked vehicles in a balanced way taking in account their residual buffer lengths. To facilitate the proposed video content delivery scheme, an approach of H.264/SVC was improvised to divide video packets into various segments, to be delivered into three defined groups. These created segments can be encoded and decoded independently and integrated back to produce the original packet sent by source vehicle. Simulation results demonstrate the efficiency of our proposed algorithm in improving the perceived video quality compared with other approache

    On the combination of multi-cloud and network coding for cost-efficient storage in industrial applications

    Get PDF
    The adoption of both Cyber–Physical Systems (CPSs) and the Internet-of-Things (IoT) has enabled the evolution towards the so-called Industry 4.0. These technologies, together with cloud computing and artificial intelligence, foster new business opportunities. Besides, several industrial applications need immediate decision making and fog computing is emerging as a promising solution to address such requirement. In order to achieve a cost-efficient system, we propose taking advantage from spot instances, a new service offered by cloud providers, which provide resources at lower prices. The main downside of these instances is that they do not ensure service continuity and they might suffer from interruptions. An architecture that combines fog and multi-cloud deployments along with Network Coding (NC) techniques, guarantees the needed fault-tolerance for the cloud environment, and also reduces the required amount of redundant data to provide reliable services. In this paper we analyze how NC can actually help to reduce the storage cost and improve the resource efficiency for industrial applications, based on a multi-cloud infrastructure. The cost analysis has been carried out using both real AWS EC2 spot instance prices and, to complement them, prices obtained from a model based on a finite Markov chain, derived from real measurements. We have analyzed the overall system cost, depending on different parameters, showing that configurations that seek to minimize the storage yield a higher cost reduction, due to the strong impact of storage cost

    On the combination of multi-cloud and network coding for cost-efficient storage in industrial applications

    Get PDF
    The adoption of both Cyber-Physical Systems (CPSs) and the Internet-of-Things (IoT) has enabled the evolution towards the so-called Industry 4.0. These technologies, together with cloud computing and artificial intelligence, foster new business opportunities. Besides, several industrial applications need immediate decision making and fog computing is emerging as a promising solution to address such requirement. In order to achieve a cost-efficient system, we propose taking advantage from spot instances, a new service offered by cloud providers, which provide resources at lower prices. The main downside of these instances is that they do not ensure service continuity and they might suffer from interruptions. An architecture that combines fog and multi-cloud deployments along with Network Coding (NC) techniques, guarantees the needed fault-tolerance for the cloud environment, and also reduces the required amount of redundant data to provide reliable services. In this paper we analyze how NC can actually help to reduce the storage cost and improve the resource efficiency for industrial applications, based on a multi-cloud infrastructure. The cost analysis has been carried out using both real AWS EC2 spot instance prices and, to complement them, prices obtained from a model based on a finite Markov chain, derived from real measurements. We have analyzed the overall system cost, depending on different parameters, showing that configurations that seek to minimize the storage yield a higher cost reduction, due to the strong impact of storage cost.This work has been partially supported by the Basque Government through the Elkartek program (Grant agreement no. KK-2018/00115), the H2020 research framework of the European Commission under the ELASTIC project (Grant agreement no. 825473), and the Spanish Ministry of Economy and Competitiveness through the CARMEN project (TEC2016-75067-C4-3-R), the ADVICE project (TEC2015-71329-C2-1-R), and the COMONSENS network (TEC2015-69648-REDC)
    • …
    corecore