21 research outputs found

    Chameleon: A Secure Cloud-Enabled and Queryable System with Elastic Properties

    Get PDF
    There are two dominant themes that have become increasingly more important in our technological society. First, the recurrent use of cloud-based solutions which provide infrastructures, computation platforms and storage as services. Secondly, the use of applicational large logs for analytics and operational monitoring in critical systems. Moreover, auditing activities, debugging of applications and inspection of events generated by errors or potential unexpected operations - including those generated as alerts by intrusion detection systems - are common situations where extensive logs must be analyzed, and easy access is required. More often than not, a part of the generated logs can be deemed as sensitive, requiring a privacy-enhancing and queryable solution. In this dissertation, our main goal is to propose a novel approach of storing encrypted critical data in an elastic and scalable cloud-based storage, focusing on handling JSONbased ciphered documents. To this end, we make use of Searchable and Homomorphic Encryption methods to allow operations on the ciphered documents. Additionally, our solution allows for the user to be near oblivious to our system’s internals, providing transparency while in use. The achieved end goal is a unified middleware system capable of providing improved system usability, privacy, and rich querying over the data. This previously mentioned objective is addressed while maintaining server-side auditable logs, allowing for searchable capabilities by the log owner or authorized users, with integrity and authenticity proofs. Our proposed solution, named Chameleon, provides rich querying facilities on ciphered data - including conjunctive keyword, ordering correlation and boolean queries - while supporting field searching and nested aggregations. The aforementioned operations allow our solution to provide data analytics upon ciphered JSON documents, using Elasticsearch as our storage and search engine.O uso recorrente de soluções baseadas em nuvem tornaram-se cada vez mais importantes na nossa sociedade. Tais soluções fornecem infraestruturas, computação e armazenamento como serviços, para alem do uso de logs volumosos de sistemas e aplicações para análise e monitoramento operacional em sistemas críticos. Atividades de auditoria, debugging de aplicações ou inspeção de eventos gerados por erros ou possíveis operações inesperadas - incluindo alertas por sistemas de detecção de intrusão - são situações comuns onde logs extensos devem ser analisados com facilidade. Frequentemente, parte dos logs gerados podem ser considerados confidenciais, exigindo uma solução que permite manter a confidencialidades dos dados durante procuras. Nesta dissertação, o principal objetivo é propor uma nova abordagem de armazenar logs críticos num armazenamento elástico e escalável baseado na cloud. A solução proposta suporta documentos JSON encriptados, fazendo uso de Searchable Encryption e métodos de criptografia homomórfica com provas de integridade e autenticação. O objetivo alcançado é um sistema de middleware unificado capaz de fornecer privacidade, integridade e autenticidade, mantendo registos auditáveis do lado do servidor e permitindo pesquisas pelo proprietário dos logs ou usuários autorizados. A solução proposta, Chameleon, visa fornecer recursos de consulta atuando em cima de dados cifrados - incluindo queries conjuntivas, de ordenação e booleanas - suportando pesquisas de campo e agregações aninhadas. As operações suportadas permitem à nossa solução suportar data analytics sobre documentos JSON cifrados, utilizando o Elasticsearch como armazenamento e motor de busca

    Secure Remote Storage of Logs with Search Capabilities

    Get PDF
    Dissertação de Mestrado em Engenharia InformáticaAlong side with the use of cloud-based services, infrastructure and storage, the use of application logs in business critical applications is a standard practice nowadays. Such application logs must be stored in an accessible manner in order to used whenever needed. The debugging of these applications is a common situation where such access is required. Frequently, part of the information contained in logs records is sensitive. This work proposes a new approach of storing critical logs in a cloud-based storage recurring to searchable encryption, inverted indexing and hash chaining techniques to achieve, in a unified way, the needed privacy, integrity and authenticity while maintaining server side searching capabilities by the logs owner. The designed search algorithm enables conjunctive keywords queries plus a fine-grained search supported by field searching and nested queries, which are essential in the referred use case. To the best of our knowledge, the proposed solution is also the first to introduce a query language that enables complex conjunctive keywords and a fine-grained search backed by field searching and sub queries.A gerac¸ ˜ao de logs em aplicac¸ ˜oes e a sua posterior consulta s˜ao fulcrais para o funcionamento de qualquer neg´ocio ou empresa. Estes logs podem ser usados para eventuais ac¸ ˜oes de auditoria, uma vez que estabelecem uma baseline das operac¸ ˜oes realizadas. Servem igualmente o prop´ osito de identificar erros, facilitar ac¸ ˜oes de debugging e diagnosticar bottlennecks de performance. Tipicamente, a maioria da informac¸ ˜ao contida nesses logs ´e considerada sens´ıvel. Quando estes logs s˜ao armazenados in-house, as considerac¸ ˜oes relacionadas com anonimizac¸ ˜ao, confidencialidade e integridade s˜ao geralmente descartadas. Contudo, com o advento das plataformas cloud e a transic¸ ˜ao quer das aplicac¸ ˜oes quer dos seus logs para estes ecossistemas, processos de logging remotos, seguros e confidenciais surgem como um novo desafio. Adicionalmente, regulac¸ ˜ao como a RGPD, imp˜oe que as instituic¸ ˜oes e empresas garantam o armazenamento seguro dos dados. A forma mais comum de garantir a confidencialidade consiste na utilizac¸ ˜ao de t ´ecnicas criptogr ´aficas para cifrar a totalidade dos dados anteriormente `a sua transfer ˆencia para o servidor remoto. Caso sejam necess´ arias capacidades de pesquisa, a abordagem mais simples ´e a transfer ˆencia de todos os dados cifrados para o lado do cliente, que proceder´a `a sua decifra e pesquisa sobre os dados decifrados. Embora esta abordagem garanta a confidencialidade e privacidade dos dados, rapidamente se torna impratic ´avel com o crescimento normal dos registos de log. Adicionalmente, esta abordagem n˜ao faz uso do potencial total que a cloud tem para oferecer. Com base nesta tem´ atica, esta tese prop˜oe o desenvolvimento de uma soluc¸ ˜ao de armazenamento de logs operacionais de forma confidencial, integra e autˆ entica, fazendo uso das capacidades de armazenamento e computac¸ ˜ao das plataformas cloud. Adicionalmente, a possibilidade de pesquisa sobre os dados ´e mantida. Essa pesquisa ´e realizada server-side diretamente sobre os dados cifrados e sem acesso em momento algum a dados n˜ao cifrados por parte do servidor..

    Weight Based Deduplication for Minimizing Data Replication in Public Cloud Storage

    Get PDF
    260-269The approach to optimize the data replication in public cloud storage when targeting the multiple instances is one of the challenging issues to process the text data. The amount of digital data has been increasing exponentially. There is a need to reduce the amount of storage space by storing the data efficiently. In cloud storage environment, the data replication provides high availability with fault tolerance system. An effective approach of deduplication system using weight based method is proposed at the target level in order to reduce the unwanted storage spaces in cloud. Storage space can be efficiently utilized by removing the unpopular files from the secondary servers. Target level consumes less processing power than source level deduplication. Multiple input text documents are stored into dropbox cloud. The top text features are detected using the Term Frequency (TF) and Named Entity Recognition (NER) and they are stored in text database. After storing the top features in database, fresh text documents are collected to find the popular and unpopular files in order to optimize the existing text corpus of cloud storage. Top Text features of the freshly collected text documents are detected using TF and NER and these unique features after the removing the duplicate features cleaning are compared with the existing features stored in the database. On the comparison, relevant text documents are listed. After listing the text documents, document frequency, document weight and threshold factor are detected. Depending on average threshold value, the popular and unpopular files are detected. The popular files are retained in all the storage nodes to achieve the full availability of data and unpopular files are removed from all the secondary servers except primary server. Before deduplication, the storage space occupied in the dropbox cloud is 8.09 MB. After deduplication, the unpopular files are removed from secondary storage nodes and the storage space in the dropbox cloud is optimized to 4.82MB. Finally, data replications are minimized and 45.60% of the cloud storage space is efficiently saved by applying the weight based deduplication system

    WEIGHT BASED DEDUPLICATION FOR MINIMIZING DATA REPLICATION IN PUBLIC CLOUD STORAGE

    Get PDF
    The approach to minimize data replication in cloud storage is one of the challenging issues to process text data. The amount of digital data has been increasing exponentially. There is a need to reduce the amount of storage space by storing the data efficiently. In cloud storage environment, the data replication provides high availability with fault tolerance system. An effective approach of deduplication system using a weight based method is proposed at the target level in order to reduce the storage space in cloud. Storage space can be efficiently utilized by removing the unpopular files from the secondary servers. Target level consumes less processing power than Source level deduplication. Input text documents are stored into Dropbox cloud. The Term Frequency (TF) and Named Entity Recognition (NER) of the documents are found. The text features found are stored in database using MySQL. After storing features in database, fresh text documents are collected to find popular and unpopular files. TF and NER are found for the freshly collected text documents and duplicate features are removed to compare with the features stored in the database. On comparison, relevant text documents are listed. After listing text documents, document frequency, document weight and threshold factor are found. Depending on threshold factor, the popular and unpopular files are detected. The popular files are replicated in all the storage nodes to achieve availability. Before deduplication, the storage space occupied in the Dropbox cloud is 8.09MB. After deduplication, the unpopular files are removed from secondary storage nodes and the storage space occupied in the Dropbox cloud is 4.82MB. Finally, data replications are minimized and 45.6% of the cloud storage space is efficiently saved by applying weight based deduplication

    Securing clouds using cryptography and traffic classification

    Get PDF
    Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. Over the last decade, cloud computing has gained popularity and wide acceptance, especially within the health sector where it offers several advantages such as low costs, flexible processes, and access from anywhere. Although cloud computing is widely used in the health sector, numerous issues remain unresolved. Several studies have attempted to review the state of the art in eHealth cloud privacy and security however, some of these studies are outdated or do not cover certain vital features of cloud security and privacy such as access control, revocation and data recovery plans. This study targets some of these problems and proposes protocols, algorithms and approaches to enhance the security and privacy of cloud computing with particular reference to eHealth clouds. Chapter 2 presents an overview and evaluation of the state of the art in eHealth security and privacy. Chapter 3 introduces different research methods and describes the research design methodology and processes used to carry out the research objectives. Of particular importance are authenticated key exchange and block cipher modes. In Chapter 4, a three-party password-based authenticated key exchange (TPAKE) protocol is presented and its security analysed. The proposed TPAKE protocol shares no plaintext data; all data shared between the parties are either hashed or encrypted. Using the random oracle model (ROM), the security of the proposed TPAKE protocol is formally proven based on the computational Diffie-Hellman (CDH) assumption. Furthermore, the analysis included in this chapter shows that the proposed protocol can ensure perfect forward secrecy and resist many kinds of common attacks such as man-in-the-middle attacks, online and offline dictionary attacks, replay attacks and known key attacks. Chapter 5 proposes a parallel block cipher (PBC) mode in which blocks of cipher are processed in parallel. The results of speed performance tests for this PBC mode in various settings are presented and compared with the standard CBC mode. Compared to the CBC mode, the PBC mode is shown to give execution time savings of 60%. Furthermore, in addition to encryption based on AES 128, the hash value of the data file can be utilised to provide an integrity check. As a result, the PBC mode has a better speed performance while retaining the confidentiality and security provided by the CBC mode. Chapter 6 applies TPAKE and PBC to eHealth clouds. Related work on security, privacy preservation and disaster recovery are reviewed. Next, two approaches focusing on security preservation and privacy preservation, and a disaster recovery plan are proposed. The security preservation approach is a robust means of ensuring the security and integrity of electronic health records and is based on the PBC mode, while the privacy preservation approach is an efficient authentication method which protects the privacy of personal health records and is based on the TPAKE protocol. A discussion about how these integrated approaches and the disaster recovery plan can ensure the reliability and security of cloud projects follows. Distributed denial of service (DDoS) attacks are the second most common cybercrime attacks after information theft. The timely detection and prevention of such attacks in cloud projects are therefore vital, especially for eHealth clouds. Chapter 7 presents a new classification system for detecting and preventing DDoS TCP flood attacks (CS_DDoS) for public clouds, particularly in an eHealth cloud environment. The proposed CS_DDoS system offers a solution for securing stored records by classifying incoming packets and making a decision based on these classification results. During the detection phase, CS_DDOS identifies and determines whether a packet is normal or from an attacker. During the prevention phase, packets classified as malicious are denied access to the cloud service, and the source IP is blacklisted. The performance of the CS_DDoS system is compared using four different classifiers: a least-squares support vector machine (LS-SVM), naïve Bayes, K-nearest-neighbour, and multilayer perceptron. The results show that CS_DDoS yields the best performance when the LS-SVM classifier is used. This combination can detect DDoS TCP flood attacks with an accuracy of approximately 97% and a Kappa coefficient of 0.89 when under attack from a single source, and 94% accuracy and a Kappa coefficient of 0.9 when under attack from multiple attackers. These results are then discussed in terms of the accuracy and time complexity, and are validated using a k-fold cross-validation model. Finally, a method to mitigate DoS attacks in the cloud and reduce excessive energy consumption through managing and limiting certain flows of packets is proposed. Instead of a system shutdown, the proposed method ensures the availability of service. The proposed method manages the incoming packets more effectively by dropping packets from the most frequent requesting sources. This method can process 98.4% of the accepted packets during an attack. Practicality and effectiveness are essential requirements of methods for preserving the privacy and security of data in clouds. The proposed methods successfully secure cloud projects and ensure the availability of services in an efficient way

    Aggregating privatized medical data for secure querying applications

    Full text link
     This thesis analyses and examines the challenges of aggregation of sensitive data and data querying on aggregated data at cloud server. This thesis also delineates applications of aggregation of sensitive medical data in several application scenarios, and tests privatization techniques to assist in improving the strength of privacy and utility

    Semantic Systems. The Power of AI and Knowledge Graphs

    Get PDF
    This open access book constitutes the refereed proceedings of the 15th International Conference on Semantic Systems, SEMANTiCS 2019, held in Karlsruhe, Germany, in September 2019. The 20 full papers and 8 short papers presented in this volume were carefully reviewed and selected from 88 submissions. They cover topics such as: web semantics and linked (open) data; machine learning and deep learning techniques; semantic information management and knowledge integration; terminology, thesaurus and ontology management; data mining and knowledge discovery; semantics in blockchain and distributed ledger technologies

    An evaluation of the challenges of Multilingualism in Data Warehouse development

    Get PDF
    In this paper we discuss Business Intelligence and define what is meant by support for Multilingualism in a Business Intelligence reporting context. We identify support for Multilingualism as a challenging issue which has implications for data warehouse design and reporting performance. Data warehouses are a core component of most Business Intelligence systems and the star schema is the approach most widely used to develop data warehouses and dimensional Data Marts. We discuss the way in which Multilingualism can be supported in the Star Schema and identify that current approaches have serious limitations which include data redundancy and data manipulation, performance and maintenance issues. We propose a new approach to enable the optimal application of multilingualism in Business Intelligence. The proposed approach was found to produce satisfactory results when used in a proof-of-concept environment. Future work will include testing the approach in an enterprise environmen

    Digital Forensics AI: on Practicality, Optimality, and Interpretability of Digital Evidence Mining Techniques

    Get PDF
    Digital forensics as a field has progressed alongside technological advancements over the years, just as digital devices have gotten more robust and sophisticated. However, criminals and attackers have devised means for exploiting the vulnerabilities or sophistication of these devices to carry out malicious activities in unprecedented ways. Their belief is that electronic crimes can be committed without identities being revealed or trails being established. Several applications of artificial intelligence (AI) have demonstrated interesting and promising solutions to seemingly intractable societal challenges. This thesis aims to advance the concept of applying AI techniques in digital forensic investigation. Our approach involves experimenting with a complex case scenario in which suspects corresponded by e-mail and deleted, suspiciously, certain communications, presumably to conceal evidence. The purpose is to demonstrate the efficacy of Artificial Neural Networks (ANN) in learning and detecting communication patterns over time, and then predicting the possibility of missing communication(s) along with potential topics of discussion. To do this, we developed a novel approach and included other existing models. The accuracy of our results is evaluated, and their performance on previously unseen data is measured. Second, we proposed conceptualizing the term “Digital Forensics AI” (DFAI) to formalize the application of AI in digital forensics. The objective is to highlight the instruments that facilitate the best evidential outcomes and presentation mechanisms that are adaptable to the probabilistic output of AI models. Finally, we enhanced our notion in support of the application of AI in digital forensics by recommending methodologies and approaches for bridging trust gaps through the development of interpretable models that facilitate the admissibility of digital evidence in legal proceedings
    corecore