2,941 research outputs found

    Deep Q Learning for Self Adaptive Distributed Microservices Architecture (in press)

    Get PDF
    One desired aspect of a self-adapting microservices architecture is the ability to continuously monitor the operational environment, detect and observe anomalous behavior, and provide a reasonable policy for self-scaling, self-healing, and self-tuning the computational resources in order to dynamically respond to a sudden change in its operational environment. The behaviour of a microservices architecture is continuously changing overtime, which makes it a challenging task to use a statistical model to identify both the normal and abnormal behaviour of the services running. The performance of the microservices cluster could fluctuate around the demand to accommodate scalability, orchestration and load balancing demands. To achieve the desired high levels of self-adaptability, this research implements microservices architectures model following the MAPE-K model. Our proposed architecture employs Markov decision process (MDP) to identify the transition from one cluster state to another. Our proposed architecture employs a deep Q- learning network (DQN) for dynamically selecting the adaptation action that yield the highest reward. This paper evaluates the effectiveness of using DQN and MDP agent to achieve high level of self-adaptability of microservice architecture. We argue in this paper that such integration between DQN and MDP in MAPE-K model offers microservice architecture with self-adaptability against the contextual changes in the operational environment. The self-adaptation property is achieved by allowing the MDP agent to explore the observation space and lets the DQN to select the adaptation policy with the highest reward, then the MDP agent executes the adaptation action and observes the changes. We believe integrating DQN into the adaptation action selection process improves the effectiveness of the adaptation and reduces the adaptation risk including resources over-provisioning and thrashing. The proposed model preserves the cluster state and preventing multiple actions to taking place at the same time. Our model also guarantees that the executed adaptation action fits the current execution context and achieves the adaptation goals

    Performance of Machine Learning and Big Data Analytics paradigms in Cybersecurity and Cloud Computing Platforms

    Get PDF
    The purpose of the research is to evaluate Machine Learning and Big Data Analytics paradigms for use in Cybersecurity. Cybersecurity refers to a combination of technologies, processes and operations that are framed to protect information systems, computers, devices, programs, data and networks from internal or external threats, harm, damage, attacks or unauthorized access. The main characteristic of Machine Learning (ML) is the automatic data analysis of large data sets and production of models for the general relationships found among data. ML algorithms, as part of Artificial Intelligence, can be clustered into supervised, unsupervised, semi-supervised, and reinforcement learning algorithms

    Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection

    Full text link
    In this paper, we introduce and evaluate PROPEDEUTICA, a novel methodology and framework for efficient and effective real-time malware detection, leveraging the best of conventional machine learning (ML) and deep learning (DL) algorithms. In PROPEDEUTICA, all software processes in the system start execution subjected to a conventional ML detector for fast classification. If a piece of software receives a borderline classification, it is subjected to further analysis via more performance expensive and more accurate DL methods, via our newly proposed DL algorithm DEEPMALWARE. Further, we introduce delays to the execution of software subjected to deep learning analysis as a way to "buy time" for DL analysis and to rate-limit the impact of possible malware in the system. We evaluated PROPEDEUTICA with a set of 9,115 malware samples and 877 commonly used benign software samples from various categories for the Windows OS. Our results show that the false positive rate for conventional ML methods can reach 20%, and for modern DL methods it is usually below 6%. However, the classification time for DL can be 100X longer than conventional ML methods. PROPEDEUTICA improved the detection F1-score from 77.54% (conventional ML method) to 90.25%, and reduced the detection time by 54.86%. Further, the percentage of software subjected to DL analysis was approximately 40% on average. Further, the application of delays in software subjected to ML reduced the detection time by approximately 10%. Finally, we found and discussed a discrepancy between the detection accuracy offline (analysis after all traces are collected) and on-the-fly (analysis in tandem with trace collection). Our insights show that conventional ML and modern DL-based malware detectors in isolation cannot meet the needs of efficient and effective malware detection: high accuracy, low false positive rate, and short classification time.Comment: 17 pages, 7 figure

    Big data analytics: a predictive analysis applied to cybersecurity in a financial organization

    Get PDF
    Project Work presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Knowledge Management and Business IntelligenceWith the generalization of the internet access, cyber attacks have registered an alarming growth in frequency and severity of damages, along with the awareness of organizations with heavy investments in cybersecurity, such as in the financial sector. This work is focused on an organization’s financial service that operates on the international markets in the payment systems industry. The objective was to develop a predictive framework solution responsible for threat detection to support the security team to open investigations on intrusive server requests, over the exponentially growing log events collected by the SIEM from the Apache Web Servers for the financial service. A Big Data framework, using Hadoop and Spark, was developed to perform classification tasks over the financial service requests, using Neural Networks, Logistic Regression, SVM, and Random Forests algorithms, while handling the training of the imbalance dataset through BEV. The main conclusions over the analysis conducted, registered the best scoring performances for the Random Forests classifier using all the preprocessed features available. Using the all the available worker nodes with a balanced configuration of the Spark executors, the most performant elapsed times for loading and preprocessing of the data were achieved using the column-oriented ORC with native format, while the row-oriented CSV format performed the best for the training of the classifiers.Com a generalização do acesso à internet, os ciberataques registaram um crescimento alarmante em frequência e severidade de danos causados, a par da consciencialização das organizações, com elevados investimentos em cibersegurança, como no setor financeiro. Este trabalho focou-se no serviço financeiro de uma organização que opera nos mercados internacionais da indústria de sistemas de pagamento. O objetivo consistiu no desenvolvimento uma solução preditiva responsável pela detecção de ameaças, por forma a dar suporte à equipa de segurança na abertura de investigações sobre pedidos intrusivos no servidor, relativamente aos exponencialmente crescentes eventos de log coletados pelo SIEM, referentes aos Apache Web Servers, para o serviço financeiro. Uma solução de Big Data, usando Hadoop e Spark, foi desenvolvida com o objectivo de executar tarefas de classificação sobre os pedidos do serviço financeiros, usando os algoritmos Neural Networks, Logistic Regression, SVM e Random Forests, solucionando os problemas associados ao treino de um dataset desequilibrado através de BEV. As principais conclusões sobre as análises realizadas registaram os melhores resultados de classificação usando o algoritmo Random Forests com todas as variáveis pré-processadas disponíveis. Usando todos os nós do cluster e uma configuração balanceada dos executores do Spark, os melhores tempos para carregar e pré-processar os dados foram obtidos usando o formato colunar ORC nativo, enquanto o formato CSV, orientado a linhas, apresentou os melhores tempos para o treino dos classificadores

    CONSERVE: A framework for the selection of techniques for monitoring containers security

    Get PDF
    Context:\ua0Container-based virtualization is gaining popularity in different domains, as it supports continuous development and improves the efficiency and reliability of run-time environments.\ua0Problem:\ua0Different techniques are proposed for monitoring the security of containers. However, there are no guidelines supporting the selection of suitable techniques for the tasks at hand.\ua0Objective:\ua0We aim to support the selection and design of techniques for monitoring container-based virtualization environments.\ua0Approach: First, we review the literature and identify techniques for monitoring containerized environments. Second, we classify these techniques according to a set of categories, such as technical characteristic, applicability, effectiveness, and evaluation. We further detail the pros and cons that are associated with each of the identified techniques.\ua0Result:\ua0As a result, we present CONSERVE, a multi-dimensional decision support framework for an informed and optimal selection of a suitable set of container monitoring techniques to be implemented in different application domains.\ua0Evaluation:\ua0A mix of eighteen researchers and practitioners evaluated the ease of use, understandability, usefulness, efficiency, applicability, and completeness of the framework. The evaluation shows a high level of interest, and points out to potential benefits

    The European Security Industry: A Research Agenda

    Get PDF
    The security industry can be defined, in the first instance, as the industry that produces the goods and services required to protect citizens from insecurity. Yet, this industry, as opposed to defence, has not been an area of intense research. Their boundaries are unclear and the industry is not well characterised. This paper analyses this knowledge gap and presents some ideas for a research agenda for this industry that could assist in unveiling the main features, the potential weaknesses and strengths, and the capability to solve the security needs of society in an efficient and effective way. The paper discusses a definition of this economic sector useful in setting its boundaries, and it briefly describes the main types of industries operating within the sector. It analyses methods for gathering information regarding the industry, customers, and other market agents. Finally, it outlines ways for assessing market performance in terms of the structure-conduct-performance paradigm.security industry, security market, terrorism and organised crime countermeasures, competition, market performance

    ENNigma: A Framework for Private Neural Networks

    Get PDF
    The increasing concerns about data privacy and the stringent enforcement of data protection laws are placing growing pressure on organizations to secure large datasets. The challenge of ensuring data privacy becomes even more complex in the domains of Artificial Intelligence and Machine Learning due to their requirement for large amounts of data. While approaches like differential privacy and secure multi-party computation allow data to be used with some privacy guarantees, they often compromise data integrity or accessibility as a tradeoff. In contrast, when using encryption-based strategies, this is not the case. While basic encryption only protects data during transmission and storage, Homomorphic Encryption (HE) is able to preserve data privacy during its processing on a centralized server. Despite its advantages, the computational overhead HE introduces is notably challenging when integrated into Neural Networks (NNs), which are already computationally expensive. In this work, we present a framework called ENNigma, which is a Private Neural Network (PNN) that uses HE for data privacy preservation. Unlike some state-of-the-art approaches, ENNigma guarantees data security throughout every operation, maintaining this guarantee even if the server is compromised. The impact of this privacy preservation layer on the NN performance is minimal, with the only major drawback being its computational cost. Several optimizations were implemented to maximize the efficiency of ENNigma, leading to occasional computational time reduction above 50%. In the context of the Network Intrusion Detection System application domain, particularly within the sub-domain of Distributed Denial of Service attack detection, several models were developed and employed to assess ENNigma’s performance in a real-world scenario. These models demonstrated comparable performance to non-private NNs while also achiev ing the two-and-a-half-minute inference latency mark. This suggests that our framework is approaching a state where it can be effectively utilized in real-time applications. The key takeaway is that ENNigma represents a significant advancement in the field of PNN as it ensures data privacy with minimal impact on NN performance. While it is not yet ready for real-world deployment due to its computational complexity, this framework serves as a milestone toward realizing fully private and efficient NNs.As preocupações crescentes com a privacidade de dados e a implementação de leis que visam endereçar este problema, estão a pressionar as organizações para assegurar a segurança das suas bases de dados. Este desafio torna-se ainda mais complexo nos domínios da Inteligência Artificial e Machine Learning, que dependem do acesso a grandes volumes de dados para obterem bons resultados. As abordagens existentes, tal como Differential Privacy e Secure Multi-party Computation, já permitem o uso de dados com algumas garantias de privacidade. No entanto, na maioria das vezes, comprometem a integridade ou a acessibilidade aos mesmos. Por outro lado, ao usar estratégias baseadas em cifras, isso não ocorre. Ao contrário das cifras mais tradicionais, que apenas protegem os dados durante a transmissão e armazenamento, as cifras homomórficas são capazes de preservar a privacidade dos dados durante o seu processamento. Nomeadamente se o mesmo for centralizado num único servidor. Apesar das suas vantagens, o custo computacional introduzido por este tipo de cifras é bastante desafiador quando integrado em Redes Neurais que, por natureza, já são computacionalmente pesadas. Neste trabalho, apresentamos uma biblioteca chamada ENNigma, que é uma Rede Neural Privada construída usando cifras homomórficas para preservar a privacidade dos dados. Ao contrário de algumas abordagens estado-da-arte, a ENNigma garante a segurança dos dados em todas as operações, mantendo essa garantia mesmo que o servidor seja comprometido. O impacto da introdução desta camada de segurança, no desempenho da rede neural, é mínimo, sendo a sua única grande desvantagem o seu custo computacional. Foram ainda implementadas diversas otimizações para maximizar a eficiência da biblioteca apresentada, levando a reduções ocasionais no tempo computacional acima de 50%. No contexto do domínio de aplicação de Sistemas de Detecção de Intrusão em Redes de Computadores, em particular dentro do subdomínio de detecção de ataques do tipo Distributed Denial of Service, vários modelos foram desenvolvidos para avaliar o desempenho da ENNigma num cenário real. Estes modelos demonstraram desempenho comparável às redes neurais não privadas, ao mesmo tempo que alcançaram uma latência de inferência de dois minutos e meio. Isso sugere que a biblioteca apresentada está a aproximar-se de um estado em que pode ser utilizada em aplicações em tempo real. A principal conclusão é que a biblioteca ENNigma representa um avanço significativo na área das Redes Neurais Privadas, pois assegura a privacidade dos dados com um impacto mínimo no desempenho da rede neural. Embora esta ferramenta ainda não esteja pronta para utilização no mundo real, devido à sua complexidade computacional, serve como um marco importante para o desenvolvimento de redes neurais totalmente privadas e eficientes
    • …
    corecore