2,181 research outputs found
SDSF : social-networking trust based distributed data storage and co-operative information fusion.
As of 2014, about 2.5 quintillion bytes of data are created each day, and 90% of the data in the world was created in the last two years alone. The storage of this data can be on external hard drives, on unused space in peer-to-peer (P2P) networks or using the more currently popular approach of storing in the Cloud. When the users store their data in the Cloud, the entire data is exposed to the administrators of the services who can view and possibly misuse the data. With the growing popularity and usage of Cloud storage services like Google Drive, Dropbox etc., the concerns of privacy and security are increasing. Searching for content or documents, from this distributed stored data, given the rate of data generation, is a big challenge. Information fusion is used to extract information based on the query of the user, and combine the data and learn useful information. This problem is challenging if the data sources are distributed and heterogeneous in nature where the trustworthiness of the documents may be varied. This thesis proposes two innovative solutions to resolve both of these problems. Firstly, to remedy the situation of security and privacy of stored data, we propose an innovative Social-based Distributed Data Storage and Trust based co-operative Information Fusion Framework (SDSF). The main objective is to create a framework that assists in providing a secure storage system while not overloading a single system using a P2P like approach. This framework allows the users to share storage resources among friends and acquaintances without compromising the security or privacy and enjoying all the benefits that the Cloud storage offers. The system fragments the data and encodes it to securely store it on the unused storage capacity of the data owner\u27s friends\u27 resources. The system thus gives a centralized control to the user over the selection of peers to store the data. Secondly, to retrieve the stored distributed data, the proposed system performs the fusion also from distributed sources. The technique uses several algorithms to ensure the correctness of the query that is used to retrieve and combine the data to improve the information fusion accuracy and efficiency for combining the heterogeneous, distributed and massive data on the Cloud for time critical operations. We demonstrate that the retrieved documents are genuine when the trust scores are also used while retrieving the data sources. The thesis makes several research contributions. First, we implement Social Storage using erasure coding. Erasure coding fragments the data, encodes it, and through introduction of redundancy resolves issues resulting from devices failures. Second, we exploit the inherent concept of trust that is embedded in social networks to determine the nodes and build a secure net-work where the fragmented data should be stored since the social network consists of a network of friends, family and acquaintances. The trust between the friends, and availability of the devices allows the user to make an informed choice about where the information should be stored using `k\u27 optimal paths. Thirdly, for the purpose of retrieval of this distributed stored data, we propose information fusion on distributed data using a combination of Enhanced N-grams (to ensure correctness of the query), Semantic Machine Learning (to extract the documents based on the context and not just bag of words and also considering the trust score) and Map Reduce (NSM) Algorithms. Lastly we evaluate the performance of distributed storage of SDSF using era- sure coding and identify the social storage providers based on trust and evaluate their trustworthiness. We also evaluate the performance of our information fusion algorithms in distributed storage systems. Thus, the system using SDSF framework, implements the beneficial features of P2P networks and Cloud storage while avoiding the pitfalls of these systems. The multi-layered encrypting ensures that all other users, including the system administrators cannot decode the stored data. The application of NSM algorithm improves the effectiveness of fusion since large number of genuine documents are retrieved for fusion
GPUs as Storage System Accelerators
Massively multicore processors, such as Graphics Processing Units (GPUs),
provide, at a comparable price, a one order of magnitude higher peak
performance than traditional CPUs. This drop in the cost of computation, as any
order-of-magnitude drop in the cost per unit of performance for a class of
system components, triggers the opportunity to redesign systems and to explore
new ways to engineer them to recalibrate the cost-to-performance relation. This
project explores the feasibility of harnessing GPUs' computational power to
improve the performance, reliability, or security of distributed storage
systems. In this context, we present the design of a storage system prototype
that uses GPU offloading to accelerate a number of computationally intensive
primitives based on hashing, and introduce techniques to efficiently leverage
the processing power of GPUs. We evaluate the performance of this prototype
under two configurations: as a content addressable storage system that
facilitates online similarity detection between successive versions of the same
file and as a traditional system that uses hashing to preserve data integrity.
Further, we evaluate the impact of offloading to the GPU on competing
applications' performance. Our results show that this technique can bring
tangible performance gains without negatively impacting the performance of
concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201
Secure and Reliable Data Outsourcing in Cloud Computing
The many advantages of cloud computing are increasingly attracting individuals and organizations to outsource their data from local to remote cloud servers. In addition to cloud infrastructure and platform providers, such as Amazon, Google, and Microsoft, more and more cloud application providers are emerging which are dedicated to offering more accessible and user friendly data storage services to cloud customers. It is a clear trend that cloud data outsourcing is becoming a pervasive service. Along with the widespread enthusiasm on cloud computing, however, concerns on data security with cloud data storage are arising in terms of reliability and privacy which raise as the primary obstacles to the adoption of the cloud. To address these challenging issues, this dissertation explores the problem of secure and reliable data outsourcing in cloud computing. We focus on deploying the most fundamental data services, e.g., data management and data utilization, while considering reliability and privacy assurance. The first part of this dissertation discusses secure and reliable cloud data management to guarantee the data correctness and availability, given the difficulty that data are no longer locally possessed by data owners. We design a secure cloud storage service which addresses the reliability issue with near-optimal overall performance. By allowing a third party to perform the public integrity verification, data owners are significantly released from the onerous work of periodically checking data integrity. To completely free the data owner from the burden of being online after data outsourcing, we propose an exact repair solution so that no metadata needs to be generated on the fly for the repaired data. The second part presents our privacy-preserving data utilization solutions supporting two categories of semantics - keyword search and graph query. For protecting data privacy, sensitive data has to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. We define and solve the challenging problem of privacy-preserving multi- keyword ranked search over encrypted data in cloud computing. We establish a set of strict privacy requirements for such a secure cloud data utilization system to become a reality. We first propose a basic idea for keyword search based on secure inner product computation, and then give two improved schemes to achieve various stringent privacy requirements in two different threat models. We also investigate some further enhancements of our ranked search mechanism, including supporting more search semantics, i.e., TF × IDF, and dynamic data operations. As a general data structure to describe the relation between entities, the graph has been increasingly used to model complicated structures and schemaless data, such as the personal social network, the relational database, XML documents and chemical compounds. In the case that these data contains sensitive information and need to be encrypted before outsourcing to the cloud, it is a very challenging task to effectively utilize such graph-structured data after encryption. We define and solve the problem of privacy-preserving query over encrypted graph-structured data in cloud computing. By utilizing the principle of filtering-and-verification, we pre-build a feature-based index to provide feature-related information about each encrypted data graph, and then choose the efficient inner product as the pruning tool to carry out the filtering procedure
A systematic review on cloud storage mechanisms concerning e-healthcare systems
As the expenses of medical care administrations rise and medical services experts are becoming rare, it is up to medical services organizations and institutes to consider the implementation of medical Health Information Technology (HIT) innovation frameworks. HIT permits health associations to smooth out their considerable cycles and offer types of assistance in a more productive and financially savvy way. With the rise of Cloud Storage Computing (CSC), an enormous number of associations and undertakings have moved their healthcare data sources to distributed storage. As the information can be mentioned whenever universally, the accessibility of information becomes an urgent need. Nonetheless, outages in cloud storage essentially influence the accessibility level. Like the other basic variables of cloud storage (e.g., reliability quality, performance, security, and protection), availability also directly impacts the data in cloud storage for e-Healthcare systems. In this paper, we systematically review cloud storage mechanisms concerning the healthcare environment. Additionally, in this paper, the state-of-the-art cloud storage mechanisms are critically reviewed for e-Healthcare systems based on their characteristics. In short, this paper summarizes existing literature based on cloud storage and its impact on healthcare, and it likewise helps researchers, medical specialists, and organizations with a solid foundation for future studies in the healthcare environment.Qatar University [IRCC-2020-009]
Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining
Extensive system scales (i.e. thousands of GPU/TPUs) and prolonged training
periods (i.e. months of pretraining) significantly escalate the probability of
failures when training large language models (LLMs). Thus, efficient and
reliable fault-tolerance methods are in urgent need. Checkpointing is the
primary fault-tolerance method to periodically save parameter snapshots from
GPU memory to disks via CPU memory. In this paper, we identify the frequency of
existing checkpoint-based fault-tolerance being significantly limited by the
storage I/O overheads, which results in hefty re-training costs on restarting
from the nearest checkpoint. In response to this gap, we introduce an in-memory
fault-tolerance framework for large-scale LLM pretraining. The framework boosts
the efficiency and reliability of fault tolerance from three aspects: (1)
Reduced Data Transfer and I/O: By asynchronously caching parameters, i.e.,
sharded model parameters, optimizer states, and RNG states, to CPU volatile
memory, Our framework significantly reduces communication costs and bypasses
checkpoint I/O. (2) Enhanced System Reliability: Our framework enhances
parameter protection with a two-layer hierarchy: snapshot management processes
(SMPs) safeguard against software failures, together with Erasure Coding (EC)
protecting against node failures. This double-layered protection greatly
improves the survival probability of the parameters compared to existing
checkpointing methods. (3) Improved Snapshotting Frequency: Our framework
achieves more frequent snapshotting compared with asynchronous checkpointing
optimizations under the same saving time budget, which improves the fault
tolerance efficiency. Empirical results demonstrate that Our framework
minimizes the overhead of fault tolerance of LLM pretraining by effectively
leveraging redundant CPU resources.Comment: Fault Tolerance, Checkpoint Optimization, Large Language Model, 3D
parallelis
Geographical forwarding algorithm based video content delivery scheme for internet of vehicles (IoV)
This is an accepted manuscript of an article published by IEEE Multimedia Communications Technical Committee in MMTC Communications – Frontiers on 31/07/2020, available online: https://mmc.committees.comsoc.org/files/2020/07/MMTC_Communication_Frontier_July_2020.pdf
The accepted version of the publication may differ from the final published version.An evolved form of Vehicular Ad hoc Networks (VANET) has recently emerged as the Internet of Vehicles (IoV). Though, there
are still some challenges that need to be addressed in support IoV applications. The objective of this research is to achieve an
efficient video content transmission over vehicular networks. We propose a balanced video-forwarding algorithm for delivering
video-based content delivery scheme. The available neighboring vehicles will be ranked to the vehicle in forwarding progress
before transmitting the video frames using proposed multi-score function. Considering the current beacon reception rate,
forwarding progress and direction to destination, in addition to residual buffer length; the proposed algorithm can elect the best
candidate to forward the video frames to the next highest ranked vehicles in a balanced way taking in account their residual buffer
lengths. To facilitate the proposed video content delivery scheme, an approach of H.264/SVC was improvised to divide video
packets into various segments, to be delivered into three defined groups. These created segments can be encoded and decoded
independently and integrated back to produce the original packet sent by source vehicle. Simulation results demonstrate the
efficiency of our proposed algorithm in improving the perceived video quality compared with other approache
On the combination of multi-cloud and network coding for cost-efficient storage in industrial applications
The adoption of both Cyber–Physical Systems (CPSs) and the Internet-of-Things (IoT) has
enabled the evolution towards the so-called Industry 4.0. These technologies, together with cloud
computing and artificial intelligence, foster new business opportunities. Besides, several industrial
applications need immediate decision making and fog computing is emerging as a promising solution
to address such requirement. In order to achieve a cost-efficient system, we propose taking advantage
from spot instances, a new service offered by cloud providers, which provide resources at lower prices.
The main downside of these instances is that they do not ensure service continuity and they might
suffer from interruptions. An architecture that combines fog and multi-cloud deployments along with
Network Coding (NC) techniques, guarantees the needed fault-tolerance for the cloud environment,
and also reduces the required amount of redundant data to provide reliable services. In this paper
we analyze how NC can actually help to reduce the storage cost and improve the resource efficiency
for industrial applications, based on a multi-cloud infrastructure. The cost analysis has been carried
out using both real AWS EC2 spot instance prices and, to complement them, prices obtained from
a model based on a finite Markov chain, derived from real measurements. We have analyzed the
overall system cost, depending on different parameters, showing that configurations that seek to
minimize the storage yield a higher cost reduction, due to the strong impact of storage cost
On the combination of multi-cloud and network coding for cost-efficient storage in industrial applications
The adoption of both Cyber-Physical Systems (CPSs) and the Internet-of-Things (IoT) has enabled the evolution towards the so-called Industry 4.0. These technologies, together with cloud computing and artificial intelligence, foster new business opportunities. Besides, several industrial applications need immediate decision making and fog computing is emerging as a promising solution to address such requirement. In order to achieve a cost-efficient system, we propose taking advantage from spot instances, a new service offered by cloud providers, which provide resources at lower prices. The main downside of these instances is that they do not ensure service continuity and they might suffer from interruptions. An architecture that combines fog and multi-cloud deployments along with Network Coding (NC) techniques, guarantees the needed fault-tolerance for the cloud environment, and also reduces the required amount of redundant data to provide reliable services. In this paper we analyze how NC can actually help to reduce the storage cost and improve the resource efficiency for industrial applications, based on a multi-cloud infrastructure. The cost analysis has been carried out using both real AWS EC2 spot instance prices and, to complement them, prices obtained from a model based on a finite Markov chain, derived from real measurements. We have analyzed the overall system cost, depending on different parameters, showing that configurations that seek to minimize the storage yield a higher cost reduction, due to the strong impact of storage cost.This work has been partially supported by the Basque Government through the Elkartek program (Grant agreement no. KK-2018/00115), the H2020 research framework of the European Commission under the ELASTIC project (Grant agreement no. 825473), and the Spanish Ministry of Economy and Competitiveness through the CARMEN project (TEC2016-75067-C4-3-R), the ADVICE project (TEC2015-71329-C2-1-R), and the COMONSENS network (TEC2015-69648-REDC)
- …