1,161 research outputs found

    Efficient data reliability management of cloud storage systems for big data applications

    Get PDF
    Cloud service providers are consistently striving to provide efficient and reliable service, to their client's Big Data storage need. Replication is a simple and flexible method to ensure reliability and availability of data. However, it is not an efficient solution for Big Data since it always scales in terabytes and petabytes. Hence erasure coding is gaining traction despite its shortcomings. Deploying erasure coding in cloud storage confronts several challenges like encoding/decoding complexity, load balancing, exponential resource consumption due to data repair and read latency. This thesis has addressed many challenges among them. Even though data durability and availability should not be compromised for any reason, client's requirements on read performance (access latency) may vary with the nature of data and its access pattern behaviour. Access latency is one of the important metrics and latency acceptance range can be recorded in the client's SLA. Several proactive recovery methods, for erasure codes are proposed in this research, to reduce resource consumption due to recovery. Also, a novel cache based solution is proposed to mitigate the access latency issue of erasure coding

    Network Coding for Distributed Cloud, Fog and Data Center Storage

    Get PDF

    SDSF : social-networking trust based distributed data storage and co-operative information fusion.

    Get PDF
    As of 2014, about 2.5 quintillion bytes of data are created each day, and 90% of the data in the world was created in the last two years alone. The storage of this data can be on external hard drives, on unused space in peer-to-peer (P2P) networks or using the more currently popular approach of storing in the Cloud. When the users store their data in the Cloud, the entire data is exposed to the administrators of the services who can view and possibly misuse the data. With the growing popularity and usage of Cloud storage services like Google Drive, Dropbox etc., the concerns of privacy and security are increasing. Searching for content or documents, from this distributed stored data, given the rate of data generation, is a big challenge. Information fusion is used to extract information based on the query of the user, and combine the data and learn useful information. This problem is challenging if the data sources are distributed and heterogeneous in nature where the trustworthiness of the documents may be varied. This thesis proposes two innovative solutions to resolve both of these problems. Firstly, to remedy the situation of security and privacy of stored data, we propose an innovative Social-based Distributed Data Storage and Trust based co-operative Information Fusion Framework (SDSF). The main objective is to create a framework that assists in providing a secure storage system while not overloading a single system using a P2P like approach. This framework allows the users to share storage resources among friends and acquaintances without compromising the security or privacy and enjoying all the benefits that the Cloud storage offers. The system fragments the data and encodes it to securely store it on the unused storage capacity of the data owner\u27s friends\u27 resources. The system thus gives a centralized control to the user over the selection of peers to store the data. Secondly, to retrieve the stored distributed data, the proposed system performs the fusion also from distributed sources. The technique uses several algorithms to ensure the correctness of the query that is used to retrieve and combine the data to improve the information fusion accuracy and efficiency for combining the heterogeneous, distributed and massive data on the Cloud for time critical operations. We demonstrate that the retrieved documents are genuine when the trust scores are also used while retrieving the data sources. The thesis makes several research contributions. First, we implement Social Storage using erasure coding. Erasure coding fragments the data, encodes it, and through introduction of redundancy resolves issues resulting from devices failures. Second, we exploit the inherent concept of trust that is embedded in social networks to determine the nodes and build a secure net-work where the fragmented data should be stored since the social network consists of a network of friends, family and acquaintances. The trust between the friends, and availability of the devices allows the user to make an informed choice about where the information should be stored using `k\u27 optimal paths. Thirdly, for the purpose of retrieval of this distributed stored data, we propose information fusion on distributed data using a combination of Enhanced N-grams (to ensure correctness of the query), Semantic Machine Learning (to extract the documents based on the context and not just bag of words and also considering the trust score) and Map Reduce (NSM) Algorithms. Lastly we evaluate the performance of distributed storage of SDSF using era- sure coding and identify the social storage providers based on trust and evaluate their trustworthiness. We also evaluate the performance of our information fusion algorithms in distributed storage systems. Thus, the system using SDSF framework, implements the beneficial features of P2P networks and Cloud storage while avoiding the pitfalls of these systems. The multi-layered encrypting ensures that all other users, including the system administrators cannot decode the stored data. The application of NSM algorithm improves the effectiveness of fusion since large number of genuine documents are retrieved for fusion

    Evaluating Erasure Codes in Dicoogle PACS

    Get PDF
    DICOM (Digital Imaging and Communication in Medicine) is a standard for image and data transmission in medical purpose hardware and is commonly used for viewing, storing, printing and transmitting images. As a part of the way that DICOM transmits files, the PACS (Picture Archiving and Communication System) platform, Dicoogle, has become one of the most in-demand image processing and viewing platforms. However, the Dicoogle PACS architecture does not guarantee image information recovery in the case of information loss. Therefore, this paper proposes a file recovery solution in the Dicoogle architecture. The proposal consists of maximizing the encoding and decoding performance of medical images through computational parallelism. To validate the proposal, the Java programming language based on the Reed-Solomon algorithm is implemented in different performance tests. The experimental results show that the proposal is optimal in terms of image processing time for the Dicoogle PACS storage system.Ministry of Science, Innovation and Universities (MICINN) of Spain PGC2018 098883-B-C44European CommissionPrograma para el Desarrollo Profesional Docente para el Tipo Superior (PRODEP) of MexicoCorporacion Ecuatoriana para el Desarrollo de la Investigacion y la Academia (CEDIA) of Ecuador CEPRA XII-2018-13Universidad de Las Americas (UDLA), Quito, Ecuador IEA.WHP.21.0

    Collaborative Communication And Storage In Energy-Synchronized Sensor Networks

    Get PDF
    In a battery-less sensor network, all the operation of sensor nodes are strictly constrained by and synchronized with the fluctuations of harvested energy, causing nodes to be disruptive from network and hence unstable network connectivity. Such wireless sensor network is named as energy-synchronized sensor networks. The unpredictable network disruptions and challenging communication environments make the traditional communication protocols inefficient and require a new paradigm-shift in design. In this thesis, I propose a set of algorithms on collaborative data communication and storage for energy-synchronized sensor networks. The solutions are based on erasure codes and probabilistic network codings. The proposed set of algorithms significantly improve the data communication throughput and persistency, and they are inherently amenable to probabilistic nature of transmission in wireless networks. The technical contributions explore collaborative communication with both no coding and network coding methods. First, I propose a collaborative data delivery protocol to exploit the optimal performance of multiple energy-synchronized paths without network coding, i.e. a new max-flow min-variance algorithm. In consort with this data delivery protocol, a localized TDMA MAC protocol is designed to synchronize nodes\u27 duty-cycles and mitigate media access contentions. However, the energy supply can change dynamically over time, making determined duty cycles synchronization difficult in practice. A probabilistic approach is investigated. Therefore, I present Opportunistic Network Erasure Coding protocol (ONEC), to collaboratively collect data. ONEC derives the probability distribution of coding degree in each node and enable opportunistic in-network recoding, and guarantee the recovery of original sensor data can be achieved with high probability upon receiving any sufficient amount of encoded packets. Next, OnCode, an opportunistic in-network data coding and delivery protocol is proposed to further improve data communication under the constraints of energy synchronization. It is resilient to packet loss and network disruptions, and does not require explicit end-to-end feedback message. Moreover, I present a network Erasure Coding with randomized Power Control (ECPC) mechanism for collaborative data storage in disruptive sensor networks. ECPC only requires each node to perform a single broadcast at each of its several randomly selected power levels. Thus it incurs very low communication overhead. Finally, I propose an integrated algorithm and middleware (Ravine Stream) to improve data delivery throughput as well as data persistency in energy-synchronized sensor network
    • …
    corecore