86 research outputs found
Secure and efficient storage of multimedia: content in public cloud environments using joint compression and encryption
The Cloud Computing is a paradigm still with many unexplored areas ranging from the
technological component to the de nition of new business models, but that is revolutionizing the way we design, implement and manage the entire infrastructure of information technology.
The Infrastructure as a Service is the delivery of computing infrastructure, typically a virtual data center, along with a set of APIs that allow applications, in an automatic way, can control the resources they wish to use. The choice of the service provider and how it applies to their business model may lead to higher or lower cost in the operation and maintenance of applications near the suppliers.
In this sense, this work proposed to carry out a literature review on the topic of Cloud
Computing, secure storage and transmission of multimedia content, using lossless compression, in public cloud environments, and implement this system by building an application that manages data in public cloud environments (dropbox and meocloud).
An application was built during this dissertation that meets the objectives set. This system provides the user a wide range of functions of data management in public cloud environments, for that the user only have to login to the system with his/her credentials, after performing the login, through the Oauth 1.0 protocol (authorization protocol) is generated an access token, this token is generated only with the consent of the user and allows the application to get access to data/user les without having to use credentials. With this token the framework can now operate and unlock the full potential of its functions. With this application
is also available to the user functions of compression and encryption so that user can make the most of his/her cloud storage system securely. The compression function works using the compression algorithm LZMA being only necessary for the user to choose the les to be compressed.
Relatively to encryption it will be used the encryption algorithm AES (Advanced Encryption Standard) that works with a 128 bit symmetric key de ned by user.
We build the research into two distinct and complementary parts: The rst part consists
of the theoretical foundation and the second part is the development of computer application where the data is managed, compressed, stored, transmitted in various environments of cloud computing. The theoretical framework is organized into two chapters, chapter 2 - Background
on Cloud Storage and chapter 3 - Data compression.
Sought through theoretical foundation demonstrate the relevance of the research, convey some of the pertinent theories and input whenever possible, research in the area. The second part of the work was devoted to the development of the application in cloud environment.
We showed how we generated the application, presented the features, advantages, and
safety standards for the data. Finally, we re ect on the results, according to the theoretical
framework made in the rst part and platform development.
We think that the work obtained is positive and that ts the goals we set ourselves
to achieve. This research has some limitations, we believe that the time for completion was scarce and the implementation of the platform could bene t from the implementation of other features.In future research it would be appropriate to continue the project expanding the capabilities
of the application, test the operation with other users and make comparative tests.A Computação em nuvem é um paradigma ainda com muitas áreas por explorar que
vão desde a componente tecnológica à definição de novos modelos de negócio, mas que está
a revolucionar a forma como projetamos, implementamos e gerimos toda a infraestrutura da
tecnologia da informação.
A Infraestrutura como Serviço representa a disponibilização da infraestrutura computacional,
tipicamente um datacenter virtual, juntamente com um conjunto de APls que permitirá
que aplicações, de forma automática, possam controlar os recursos que pretendem utilizar_ A
escolha do fornecedor de serviços e a forma como este aplica o seu modelo de negócio poderão
determinar um maior ou menor custo na operacionalização e manutenção das aplicações junto
dos fornecedores.
Neste sentido, esta dissertação propôs· se efetuar uma revisão bibliográfica sobre a
temática da Computação em nuvem, a transmissão e o armazenamento seguro de conteúdos
multimédia, utilizando a compressão sem perdas, em ambientes em nuvem públicos, e implementar
um sistema deste tipo através da construção de uma aplicação que faz a gestão dos
dados em ambientes de nuvem pública (dropbox e meocloud).
Foi construída uma aplicação no decorrer desta dissertação que vai de encontro aos objectivos
definidos. Este sistema fornece ao utilizador uma variada gama de funções de gestão
de dados em ambientes de nuvem pública, para isso o utilizador tem apenas que realizar o login
no sistema com as suas credenciais, após a realização de login, através do protocolo Oauth 1.0
(protocolo de autorização) é gerado um token de acesso, este token só é gerado com o consentimento
do utilizador e permite que a aplicação tenha acesso aos dados / ficheiros do utilizador
~em que seja necessário utilizar as credenciais. Com este token a aplicação pode agora operar e
disponibilizar todo o potencial das suas funções. Com esta aplicação é também disponibilizado
ao utilizador funções de compressão e encriptação de modo a que possa usufruir ao máximo
do seu sistema de armazenamento cloud com segurança. A função de compressão funciona
utilizando o algoritmo de compressão LZMA sendo apenas necessário que o utilizador escolha os
ficheiros a comprimir. Relativamente à cifragem utilizamos o algoritmo AES (Advanced Encryption
Standard) que funciona com uma chave simétrica de 128bits definida pelo utilizador.
Alicerçámos a investigação em duas partes distintas e complementares: a primeira parte
é composta pela fundamentação teórica e a segunda parte consiste no desenvolvimento da aplicação
informática em que os dados são geridos, comprimidos, armazenados, transmitidos em
vários ambientes de computação em nuvem. A fundamentação teórica encontra-se organizada
em dois capítulos, o capítulo 2 - "Background on Cloud Storage" e o capítulo 3 "Data Compression",
Procurámos, através da fundamentação teórica, demonstrar a pertinência da investigação. transmitir algumas das teorias pertinentes e introduzir, sempre que possível, investigações
existentes na área. A segunda parte do trabalho foi dedicada ao desenvolvimento da
aplicação em ambiente "cloud". Evidenciámos o modo como gerámos a aplicação, apresentámos
as funcionalidades, as vantagens. Por fim, refletimos sobre os resultados , de acordo com o
enquadramento teórico efetuado na primeira parte e o desenvolvimento da plataforma.
Pensamos que o trabalho obtido é positivo e que se enquadra nos objetivos que nos propusemos
atingir. Este trabalho de investigação apresenta algumas limitações, consideramos que
o tempo para a sua execução foi escasso e a implementação da plataforma poderia beneficiar
com a implementação de outras funcionalidades. Em investigações futuras seria pertinente dar continuidade ao projeto ampliando as potencialidades da aplicação, testar o funcionamento
com outros utilizadores e efetuar testes comparativos.Fundação para a Ciência e a Tecnologia (FCT
A Deterministic Eviction Model for Removing Redundancies in Video Corpus
The traditional storage approaches are being challenged by huge data volumes. In multimedia content, every file does not necessarily get tagged as an exact duplicate; rather they are prone to editing and resulting in similar copies of the same file. This paper proposes the similarity-based deduplication approach to evict similar duplicates from the archive storage, which compares the samples of binary hashes to identify the duplicates. This eviction is done by initially dividing the query video into dynamic key frames based on the video length. Binary hash codes of these frames are then compared with existing key frames to identify the differences. The similarity score is determined based on these differences, which decides the eradication strategy of duplicate copy. Duplicate elimination goes through two levels, namely removal of exact duplicates and similar duplicates. The proposed approach has shortened the comparison window by comparing only the candidate hash codes based on the dynamic key frames and aims the accurate lossless duplicate removals. The presented work is executed and tested on the produced synthetic video dataset. Results show the reduction in redundant data and increase in the storage space. Binary hashes and similarity scores contributed to achieving good deduplication ratio and overall performance
Weight Based Deduplication for Minimizing Data Replication in Public Cloud Storage
260-269The approach to optimize the data replication in public cloud storage when targeting the multiple instances is one of the challenging issues to process the text data. The amount of digital data has been increasing exponentially. There is a need to reduce the amount of storage space by storing the data efficiently. In cloud storage environment, the data replication provides high availability with fault tolerance system. An effective approach of deduplication system using weight based method is proposed at the target level in order to reduce the unwanted storage spaces in cloud. Storage space can be efficiently utilized by removing the unpopular files from the secondary servers. Target level consumes less processing power than source level deduplication. Multiple input text documents are stored into dropbox cloud. The top text features are detected using the Term Frequency (TF) and Named Entity Recognition (NER) and they are stored in text database. After storing the top features in database, fresh text documents are collected to find the popular and unpopular files in order to optimize the existing text corpus of cloud storage. Top Text features of the freshly collected text documents are detected using TF and NER and these unique features after the removing the duplicate features cleaning are compared with the existing features stored in the database. On the comparison, relevant text documents are listed. After listing the text documents, document frequency, document weight and threshold factor are detected. Depending on average threshold value, the popular and unpopular files are detected. The popular files are retained in all the storage nodes to achieve the full availability of data and unpopular files are removed from all the secondary servers except primary server. Before deduplication, the storage space occupied in the dropbox cloud is 8.09 MB. After deduplication, the unpopular files are removed from secondary storage nodes and the storage space in the dropbox cloud is optimized to 4.82MB. Finally, data replications are minimized and 45.60% of the cloud storage space is efficiently saved by applying the weight based deduplication system
WEIGHT BASED DEDUPLICATION FOR MINIMIZING DATA REPLICATION IN PUBLIC CLOUD STORAGE
The approach to minimize data replication in cloud storage is one of the challenging issues to process text data. The amount of digital data has been increasing exponentially. There is a need to reduce the amount of storage space by storing the data efficiently. In cloud storage environment, the data replication provides high availability with fault tolerance system. An effective approach of deduplication system using a weight based method is proposed at the target level in order to reduce the storage space in cloud. Storage space can be efficiently utilized by removing the unpopular files from the secondary servers. Target level consumes less processing power than Source level deduplication. Input text documents are stored into Dropbox cloud. The Term Frequency (TF) and Named Entity Recognition (NER) of the documents are found. The text features found are stored in database using MySQL. After storing features in database, fresh text documents are collected to find popular and unpopular files. TF and NER are found for the freshly collected text documents and duplicate features are removed to compare with the features stored in the database. On comparison, relevant text documents are listed. After listing text documents, document frequency, document weight and threshold factor are found. Depending on threshold factor, the popular and unpopular files are detected. The popular files are replicated in all the storage nodes to achieve availability. Before deduplication, the storage space occupied in the Dropbox cloud is 8.09MB. After deduplication, the unpopular files are removed from secondary storage nodes and the storage space occupied in the Dropbox cloud is 4.82MB. Finally, data replications are minimized and 45.6% of the cloud storage space is efficiently saved by applying weight based deduplication
Doctor of Philosophy
dissertationIn the past few years, we have seen a tremendous increase in digital data being generated. By 2011, storage vendors had shipped 905 PB of purpose-built backup appliances. By 2013, the number of objects stored in Amazon S3 had reached 2 trillion. Facebook had stored 20 PB of photos by 2010. All of these require an efficient storage solution. To improve space efficiency, compression and deduplication are being widely used. Compression works by identifying repeated strings and replacing them with more compact encodings while deduplication partitions data into fixed-size or variable-size chunks and removes duplicate blocks. While we have seen great improvements in space efficiency from these two approaches, there are still some limitations. First, traditional compressors are limited in their ability to detect redundancy across a large range since they search for redundant data in a fine-grain level (string level). For deduplication, metadata embedded in an input file changes more frequently, and this introduces more unnecessary unique chunks, leading to poor deduplication. Cloud storage systems suffer from unpredictable and inefficient performance because of interference among different types of workloads. This dissertation proposes techniques to improve the effectiveness of traditional compressors and deduplication in improving space efficiency, and a new IO scheduling algorithm to improve performance predictability and efficiency for cloud storage systems. The common idea is to utilize similarity. To improve the effectiveness of compression and deduplication, similarity in content is used to transform an input file into a compression- or deduplication-friendly format. We propose Migratory Compression, a generic data transformation that identifies similar data in a coarse-grain level (block level) and then groups similar blocks together. It can be used as a preprocessing stage for any traditional compressor. We find metadata have a huge impact in reducing the benefit of deduplication. To isolate the impact from metadata, we propose to separate metadata from data. Three approaches are presented for use cases with different constrains. For the commonly used tar format, we propose Migratory Tar: a data transformation and also a new tar format that deduplicates better. We also present a case study where we use deduplication to reduce storage consumption for storing disk images, while at the same time achieving high performance in image deployment. Finally, we apply the same principle of utilizing similarity in IO scheduling to prevent interference between random and sequential workloads, leading to efficient, consistent, and predictable performance for sequential workloads and a high disk utilization
Trustworthy Edge Machine Learning: A Survey
The convergence of Edge Computing (EC) and Machine Learning (ML), known as
Edge Machine Learning (EML), has become a highly regarded research area by
utilizing distributed network resources to perform joint training and inference
in a cooperative manner. However, EML faces various challenges due to resource
constraints, heterogeneous network environments, and diverse service
requirements of different applications, which together affect the
trustworthiness of EML in the eyes of its stakeholders. This survey provides a
comprehensive summary of definitions, attributes, frameworks, techniques, and
solutions for trustworthy EML. Specifically, we first emphasize the importance
of trustworthy EML within the context of Sixth-Generation (6G) networks. We
then discuss the necessity of trustworthiness from the perspective of
challenges encountered during deployment and real-world application scenarios.
Subsequently, we provide a preliminary definition of trustworthy EML and
explore its key attributes. Following this, we introduce fundamental frameworks
and enabling technologies for trustworthy EML systems, and provide an in-depth
literature review of the latest solutions to enhance trustworthiness of EML.
Finally, we discuss corresponding research challenges and open issues.Comment: 27 pages, 7 figures, 10 table
Service Abstractions for Scalable Deep Learning Inference at the Edge
Deep learning driven intelligent edge has already become a reality, where millions of mobile, wearable, and IoT devices analyze real-time data and transform those into actionable insights on-device. Typical approaches for optimizing deep learning inference mostly focus on accelerating the execution of individual inference tasks, without considering the contextual correlation unique to edge environments and the statistical nature of learning-based computation. Specifically, they treat inference workloads as individual black boxes and apply canonical system optimization techniques, developed over the last few decades, to handle them as yet another type of computation-intensive applications. As a result, deep learning inference on edge devices still face the ever increasing challenges of customization to edge device heterogeneity, fuzzy computation redundancy between inference tasks, and end-to-end deployment at scale. In this thesis, we propose the first framework that automates and scales the end-to-end process of deploying efficient deep learning inference from the cloud to heterogeneous edge devices. The framework consists of a series of service abstractions that handle DNN model tailoring, model indexing and query, and computation reuse for runtime inference respectively. Together, these services bridge the gap between deep learning training and inference, eliminate computation redundancy during inference execution, and further lower the barrier for deep learning algorithm and system co-optimization. To build efficient and scalable services, we take a unique algorithmic approach of harnessing the semantic correlation between the learning-based computation. Rather than viewing individual tasks as isolated black boxes, we optimize them collectively in a white box approach, proposing primitives to formulate the semantics of the deep learning workloads, algorithms to assess their hidden correlation (in terms of the input data, the neural network models, and the deployment trials) and merge common processing steps to minimize redundancy
- …