849 research outputs found

    SafeSpark: a secure data analytics platform using cryptographic techniques and trusted hardware

    Get PDF
    Dissertação de mestrado em Informatics EngineeringNowadays, most companies resort to data analytics frameworks to extract value from the increasing amounts of digital information. These systems give substantial competitive ad vantages to companies since they allow to support situations such as possible marketing decisions or predict user behaviors. Therefore, organizations tend to leverage the cloud to store and perform analytics over the data. Database services in the cloud present significant advantages as a high level of efficiency and flexibility, and the reduction of costs inherent to the maintenance and management of private infrastructures. The problem is that these services are often a target for malicious attacks, which means that sensitive and private personal information can be compromised. The current secure analytical processing solutions use a limited set of cryptographic techniques or technologies, which makes it impossible to explore different trade-offs of performance, security, and functionality requirements for different applications. Moreover, these systems also do not explore the combination of multiple cryptographic techniques and trusted hardware to protect sensitive data. The work presented here addresses this challenge, by using cryptographic schemes and the Intel SGX technology to protect confidential information, ensuring a practical solution which can be adapted to applications with different requirements. In detail, this dissertation begins by exposing a baseline study about cryptographic schemes and the Intel SGX tech nology, followed by the state-of-the-art revision about secure data analytics frameworks. A new solution based on the Apache Spark framework, called SafeSpark, is proposed. It provides a modular and extensible architecture and prototype, which allows protecting in formation and processing analytical queries over encrypted data, using three cryptographic schemes and the SGX technology. We validated the prototype with an experimental evalu ation, where we analyze the performance costs of the solution and also its resource usage. For this purpose, we use the TPC-DS benchmark to evaluate the proposed solution, and the results show that it is possible to perform analytical processing on protected data with a performance impact between 1.13x and 4.1x.Atualmente, um grande número de empresas recorre a ferramentas de análise de dados para extrair valor da quantidade crescente de informações digitais que são geradas. Estes sistemas apresentam consideráveis vantagens competitivas para as empresas, uma vez que permitem suportar situações como melhores decisões de marketing, ou até mesmo prever o comportamento dos seus clientes. Neste sentido, estas organizações tendem a recorrer a serviços de bases de dados na nuvem para armazenar e processar informação, uma vez que estas apresentam vantagens significativas como alto nível de eficiência e flexibilidade, bem como a redução de custos inerentes a manter e gerir uma infraestrutura privada. No entanto, estes serviços são frequentemente alvo de ataques maliciosos, o que leva a que informações pessoais privadas possam estar comprometidas. As soluções atuais de processamento analítico seguro utilizam um conjunto limitado de técnicas criptográficas ou tecnologias, o que impossibilita o balanceamento de diferentes compromissos entre performance, segurança e funcionalidade para diferentes aplicações. Ainda, estes sistemas não permitem explorar a simultânea utilização de técnicas criptográficas e de hardware confiável para proteger informação sensível. O trabalho apresentado nesta dissertação tem como objetivo responder a este desafio, utilizando esquemas criptográficos e a tecnologia Intel SGX para proteger informação confidencial, garantindo unia solução prática que pode ser adaptada a aplicações com diferentes requisitos. Em detalhe, este documento começa por expor um estudo de base sobre esquemas criptográficos e sobre a tecnologia SGX, seguido de uma revisão do estado de arte atual sobre ferramentas de processamento analítico seguro. Uma nova solução baseada na plataforma Apache Spark, chamada SafeSpark, é proposta. Esta providencia uma arquitetura modular e extensível, bem como um protótipo, que possibilita proteger informação e executar interrogações analíticas sobre dados cifrados, utilizando três esquemas criptográficos e a tecnologia Intel SGX. O protótipo foi validado com uma avaliação experimental, onde analisamos a penalização de desempenho da solução, bem como a sua utilização de recursos computacionais. Com este propósito, foi utilizada a plataforma de avaliação TPC-DS para avaliar a solução proposta, e os resultados mostram que é possível executar processamento analítico sobre dados protegidos, apresentando um impacto no desempenho entre 1.13x e 4.1x.This work was partially funded by FCT - Fundação para a Ciência e a Tecnologia, I.P., (Portuguese Foundation for Science and Technology) within project UID/EEA/50014/2019

    Cloud BI: A Multi-party Authentication Framework for Securing Business Intelligence on the Cloud

    Get PDF
    Business intelligence (BI) has emerged as a key technology to be hosted on Cloud computing. BI offers a method to analyse data thereby enabling informed decision making to improve business performance and profitability. However, within the shared domains of Cloud computing, BI is exposed to increased security and privacy threats because an unauthorised user may be able to gain access to highly sensitive, consolidated business information. The business process contains collaborating services and users from multiple Cloud systems in different security realms which need to be engaged dynamically at runtime. If the heterogamous Cloud systems located in different security realms do not have direct authentication relationships then it is technically difficult to enable a secure collaboration. In order to address these security challenges, a new authentication framework is required to establish certain trust relationships among these BI service instances and users by distributing a common session secret to all participants of a session. The author addresses this challenge by designing and implementing a multiparty authentication framework for dynamic secure interactions when members of different security realms want to access services. The framework takes advantage of the trust relationship between session members in different security realms to enable a user to obtain security credentials to access Cloud resources in a remote realm. This mechanism can help Cloud session users authenticate their session membership to improve the authentication processes within multi-party sessions. The correctness of the proposed framework has been verified by using BAN Logics. The performance and the overhead have been evaluated via simulation in a dynamic environment. A prototype authentication system has been designed, implemented and tested based on the proposed framework. The research concludes that the proposed framework and its supporting protocols are an effective functional basis for practical implementation testing, as it achieves good scalability and imposes only minimal performance overhead which is comparable with other state-of-art methods

    Design and implementation of serverless architecture for i2b2 on AWS cloud and Snowflake data warehouse

    Get PDF
    Informatics for Integrating Biology and the Beside (i2b2) is an open-source medical tool for cohort discovery that allows researchers to explore and query clinical data. The i2b2 platform is designed to adopt any patient-centric data models and used at over 400 healthcare institutions worldwide for querying patient data. The platform consists of a webclient, core servers and database. Despite having installation guidelines, the complex architecture of the system with numerous dependencies and configuration parameters makes it difficult to install a functional i2b2 platform. On the other hand, maintaining the scalability, security, availability of the application is also challenging and requires lot of resources. Our aim was to deploy the i2b2 for University of Missouri (UM) System in the cloud as well as reduce the complexity and effort of the installation and maintenance process. Our solution encapsulated the complete installation process of each component using docker and deployed the container in the AWS Virtual Private Cloud (VPC) using several AWS PaaS (Platform as a Service), IaaS (Infrastructure as a Service) services. We deployed the application as a service in the AWS FARGATE, an on-demand, serverless, auto scalable compute engine. We also enhanced the functionality of i2b2 services and developed Snowflake JDBC driver support for i2b2 backend services. It enabled i2b2 services to query directly from Snowflake analytical database. In addition, we also created i2b2-data-installer package to load PCORnet CDM and ACT ontology data into i2b2 database. The i2b2 platform in University of Missouri holds 1.26B facts of 2.2M patients of UM Cerner Millennium data.Includes bibliographical references

    Fraud and Performance Monitoring of Credit Card Tokenization Using Business Intelligence

    Get PDF
    This project major objective is to gather all the necessary data to analyze and deliver a best analytical reporting platform. This product developed for the analysts is expected to extensively use for insights on the token provisioning and its varied utilization with the banks and merchants. Also to monitor fraudulent occurring patterns and initiate necessary steps to avoid facing any adversities in the future. The reports are generated using the principles supporting descriptive analytics. Using many different KPIs, metrics and scorecards, etc., to support the analysis has given an advantage for better yield. These analytical dashboard has given a deep dive insight for the analysts. This project has been used by many analysts to come to an agreement on different patterns noticed by each individual. Also for the Senior Executives to get a profound understanding of how the widely different tokenization are used and its different attribute wise segregation

    Empowering SMEs to make better decisions with Business Intelligence: A Case Study

    Get PDF
    With the advance of Business Information Systems (BIS), irrespective of the size, companies have adopted an approach to electronic data collection and management for two decades. The advancement in technology means they have in their possessions large volumes of historical data. Large organizations have cached on this and use a range of tools and techniques to leverage the usefulness of this information to make more informed business decisions. For most small and medium- sized enterprises (SMEs), however, such data typically sits in an archive without being utilized. While SMEs appreciate the need for utilizing historical data to make more informed business decisions, they often lack the technical knowhow and funding to embrace an effective BI solution. In this paper, drawing from our experience in implementing a BI solution for a UK SME we discuss some potential tools and strategies that could help SMEs overcome these challenges so as to reap the benefits of adopting an effective BI solution

    Data Migration to Cloud in ERP Implementations

    Get PDF
    The concept of Cloud Computing has evolved constantly, in strand of service models, based on the creation and share of several technological resources. Increasingly, it has used virtualization technology to optimize resources, which are shared by all accounts, in a self-service format. All these features result in a flexible and progressive behavior of resources. The management of the service provided, is made based on the service level agreement established between the client and the cloud provider, and the constant technological developments can quickly change depending of the requirements. That said, knowing the current state of Wipro with respect to the concepts of data migration and cloud being combined into one, it is very challenging to design and build a possible process to help the company make this transition. Especially, when there is already a tool that has been used for several years and it is intended to be part of this integration with a new possible solution that is described along this document. The study, of qualitative nature, is guided by different case studies when it comes to the processes being used to migrate data into Cloud. The main objectives are to find new solutions that increase productivity of the company, save human resources that can be reallocated to other tasks, ending up to be considered innovative solutions, with rapid implementations and most importantly with low cost. What ends up in the overall objective of this dissertation that is to examine the feasibility of the adoption of Cloud Computing in Wipro Portugal through two main points: • The migration of data into Cloud; • Integration with Data Conversion Tool (DCT). We believe that this approach is very meaningful towards encouraging greater productivity and obtaining new achievements. Concerning the empirical study, there is a big number of tools that can be investigated later as possible solutions for other kind of implementations than Oracle Retail. For now this dissertation focus in the current OR business approaches and points for Oracle Cloud as the main Cloud Computing Service due to its partnership with Wipro. Both solutions that were implemented, SQL Loader and Golden Gate, seem viable and versatile as they can be integrated with the current tool, DCT and are capable of loading several amounts of data without any issues. In terms of performance Golden Gate seems to be a few steps above of SQL Loader, but requires deeper analysis when using multi threading as an option in both methods, and the containerization of the databases can be very relevant regarding the times of loading. In general, good solutions are available and need to be taken into consideration by the company as they can help to leverage its resources in a more efficient way and the main objective of having data in the Cloud was reached, having gathered knowledge about the behaviour of Oracle Cloud and some of the services

    REVIEW OF CLOUD DATABASE BENEFITS AND CHALLENGES

    Get PDF
    The volume of data is increasing rapidly, that is why using cloud computing to store and process data may be inevitable. Providers offer many database services in public cloud that include many types of traditional relational and non-relational databases, as well as special purpose databases. Organizations can then migrate their data to cloud databases, however, decisions makers need to be aware of cloud benefits and challenges. Data in cloud is globally distributed, computing resources can be scaled up or down according to demand, cloud providers guarantee high level of service availability, many manual database administration tasks are automated. Data partitioning, replication and scaling ensure high performance. Certain applications of databases are cheaper than in private environments, however, sometimes using cloud database may be more expensive. Security is considered as a one of the main cloud database concerns due to storing data in external infrastructure. Due to regulations, organizations have to consider data privacy issues. Shared infrastructure offered in cloud is beneficial, however, sometimes isolated environments are better for cloud databases

    Securing cloud-hosted applications using active defense with rule-based adaptations

    Get PDF
    Security cloud-based applications is a dynamic problem since modern attacks are always evolving in their sophistication and disruption impact. Active defense is a state-of-the-art paradigm where proactive or reactive cybersecurity strategies are used to augment passive defense policies (e.g., firewalls). It involves using knowledge of the adversary to create of dynamic policy measures to secure resources and outsmart adversaries to make cyber-attacks difficult to execute. Using intelligent threat detection systems based on machine learning and active defense solutions implemented via cloud resource adaptations, we can slowdown attacks and derail attackers at an early stage so that they cannot proceed with their plots, while also increasing the probability that they will expose their presence or reveal their attack vectors. In this MS Thesis, we demonstrate the concept and benefits of active defense in securing cloud-based applications through rule-based adaptations on distributed resources. Specifically, we propose two novel active defense strategies to mitigate impact of security anomaly events within: (a) social virtual reality learning environment (VRLE), and (b) healthcare data sharing environment (HDSE). Our first strategy involves a "rule-based 3QS-adaptation framework" that performs risk and cost aware trade-off analysis to control cybersickness due to performance/security anomaly events during a VRLE session. VRLEs provide immersive experience to users with increased accessibility to remote learning, thus a breach of security in critical VRLE application domains (e.g., healthcare, military training, manufacturing) can disrupt functionality and induce cybersickness. Our framework implementation in a real-world social VRLE viz., vSocial monitors performance/security anomaly events in network data. In the event of an anomaly, the framework features rule-based adaptations that are triggered by using various decision metrics. Based on our experimental results, we demonstrate the effectiveness of our rulebased 3QS-adaptation framework in reducing cybersickness levels, while maintaining application functionality. Our second strategy involves a "defense by pretense methodology" that uses real-time attack detection and creates cyber deception for HDSE applications. Healthcare data consumers (e.g., clinicians and researchers) require access to massive, protected datasets, thus loss of assurance/auditability of critical data such as Electronic Health Records (EHR) can severely impact loss of privacy of patient's data and the reputation of the healthcare organizations. Our cyber deception utilizes elastic capacity provisioning via use of rule-based adaptation to provision Quarantine Virtual Machines (QVMs) that handle redirected attacker's traffic and increase threat intelligence collection. We evaluate our defense by pretense design by creating an experimental Amazon Web Services (AWS) testbed hosting a real-world OHDSI setup for protected health data analytics/sharing with electronic health record data (SynPUF) and publications data (CORD-19) related to COVID-19. Our experiment results show how we can successfully detect targeted attacks such as e.g., DDoS and create redirection of attack sources to QVMs.Includes bibliographical references

    Pervasive brain monitoring and data sharing based on multi-tier distributed computing and linked data technology

    Get PDF
    EEG-based Brain-computer interfaces (BCI) are facing grant challenges in their real-world applications. The technical difficulties in developing truly wearable multi-modal BCI systems that are capable of making reliable real-time prediction of users’ cognitive states under dynamic real-life situations may appear at times almost insurmountable. Fortunately, recent advances in miniature sensors, wireless communication and distributed computing technologies offered promising ways to bridge these chasms. In this paper, we report our attempt to develop a pervasive on-line BCI system by employing state-of-art technologies such as multi-tier fog and cloud computing, semantic Linked Data search and adaptive prediction/classification models. To verify our approach, we implement a pilot system using wireless dry-electrode EEG headsets and MEMS motion sensors as the front-end devices, Android mobile phones as the personal user interfaces, compact personal computers as the near-end fog servers and the computer clusters hosted by the Taiwan National Center for High-performance Computing (NCHC) as the far-end cloud servers. We succeeded in conducting synchronous multi-modal global data streaming in March and then running a multi-player on-line BCI game in September, 2013. We are currently working with the ARL Translational Neuroscience Branch and the UCSD Movement Disorder Center to use our system in real-life personal stress and in-home Parkinson’s disease patient monitoring experiments. We shall proceed to develop a necessary BCI ontology and add automatic semantic annotation and progressive model refinement capability to our system
    corecore