63 research outputs found

    Gaining deep knowledge of Android malware families through dimensionality reduction techniques

    Get PDF
    This research proposes the analysis and subsequent characterisation of Android malware families by means of low dimensional visualisations using dimensional reduction techniques. The well-known Malgenome data set, coming from the Android Malware Genome Project, has been thoroughly analysed through the following six dimensionality reduction techniques: Principal Component Analysis, Maximum Likelihood Hebbian Learning, Cooperative Maximum Likelihood Hebbian Learning, Curvilinear Component Analysis, Isomap and Self Organizing Map. Results obtained enable a clear visual analysis of the structure of this high-dimensionality data set, letting us gain deep knowledge about the nature of such Android malware families. Interesting conclusions are obtained from the real-life data set under analysis

    Análisis y detección de ataques informáticos mediante sistemas inteligentes de reducción dimensional

    Get PDF
    Programa Oficial de Doutoramento en Enerxía e Propulsión Mariña. 5014P01[Resumen] El presente trabajo de investigación aborda el estudio y desarrollo de una metodología para la detección de ataques informáticos mediante el uso de sistemas y técnicas inteligentes de reducción dimensional en el ámbito de la ciberseguridad. Con esta propuesta se pretende dividir el problema en dos fases. La primera consiste en un reducción dimensional del espacio de entrada original, proyectando los datos sobre un espacio de salida de menor dimensión mediante transformaciones lineales y/o no lineales que permiten obtener una mejor visualización de la estructura interna del conjunto de datos. En la segunda fase se introduce el conocimiento de un experto humano que permite aportar su conocimiento mediante el etiquetado de las muestras en base a las proyecciones obtenidas y su experiencia sobre el problema. Esta novedosa propuesta pone a disposición del usuario final una herramienta sencilla y proporciona unos resultados intuitivos y fácilmente interpretables, permitiendo hacer frente a nuevas amenazas a las que el usuario no se haya visto expuesto, obteniendo resultados altamente satisfactorios en todos los casos reales en los que se ha aplicado. El sistema desarrollado ha sido validado sobre tres supuestos reales diferentes, en los que se ha avanzado en términos de conocimiento con un claro hilo conductor de progreso positivo de la propuesta. En el primero de los casos se efectúa un análisis de un conocido conjunto de datos de malware de Android en el que, mediante técnicas clásicas de reducción dimensional, se efectúa una caracterización de las diversas familias de malware. Para la segunda de las propuestas se trabaja sobre el mismo conjunto de datos, pero en este caso se aplican técnicas más avanzadas e incipientes de reducción dimensional y visualización, consiguiendo que los resultados se mejoren significativamente. En el último de los trabajos se aprovecha el conocimiento de los dos trabajos previos, y se aplica a la detección de intrusión en sistemas informáticos sobre datos de redes, en las que se producen ataques de diversa índole durante procesos de funcionamiento normal de la red.[Abstract] This research work addresses the study and development of a methodology for the detection of computer attacks using intelligent systems and techniques for dimensional reduction in the eld of cybersecurity. This proposal is intended to divide the problem into two phases. The rst consists of a dimensional reduction of the original input space, projecting the data onto a lower-dimensional output space using linear or non-linear transformations that allow a better visualization of the internal structure of the dataset. In the second phase, the experience of an human expert is presented, which makes it possible to contribute his knowledge by labeling the samples based on the projections obtained and his experience on the problem. This innovative proposal makes a simple tool available to the end user and provides intuitive and easily interpretable results, allowing to face new threats to which the user has not been exposed, obtaining highly satisfactory results in all real cases in which has been applied. The developed system has been validated on three di erent real case studies, in which progress has been made in terms of knowledge with a clear guiding thread of positive progress of the proposal. In the rst case, an analysis of a well-known Android malware dataset is carried out, in which a characterization of the various families of malware is developed using classical dimensional reduction techniques. For the second of the proposals, it has been worked on the same data set, but in this case more advanced and incipient techniques of dimensional reduction and visualization are applied, achieving a signi cant improvement in the results. The last work takes advantage of the knowledge of the two previous works, which is applied to the detection of intrusion in computer systems on network dataset, in which attacks of di erent kinds occur during normal network operation processes.[Resumo] Este traballo de investigación aborda o estudo e desenvolvemento dunha metodoloxía para a detección de ataques informáticos mediante o uso de sistemas e técnicas intelixentes de reducción dimensional no ámbito da ciberseguridade. Esta proposta pretende dividir o problema en dúas fases. A primeira consiste nunha redución dimensional do espazo de entrada orixinal, proxectando os datos nun espazo de saída de menor dimensionalidade mediante transformacións lineais ou non lineais que permitan unha mellor visualización da estrutura interna do conxunto de datos. Na segunda fase, introdúcese a experiencia dun experto humano, que lle permite achegar os seus coñecementos etiquetando as mostras en función das proxeccións obtidas e da súa experiencia sobre o problema. Esta proposta innovadora pon a disposición do usuario nal unha ferramenta sinxela e proporciona resultados intuitivos e facilmente interpretables, que permiten facer fronte a novas ameazas ás que o usuario non estivo exposto, obtendo resultados altamente satisfactorios en todos os casos reais nos que se aplicou. O sistema desenvolvido validouse sobre tres supostos reais diferentes, nos que se avanzou en canto ao coñecemento cun claro fío condutor de avance positivo da proposta. No primeiro caso, realízase unha análise dun coñecido conxunto de datos de malware Android, no que se realiza unha caracterización das distintas familias de malware mediante técnicas clásicas de reducción dimensional. Para a segunda das propostas trabállase sobre o mesmo conxunto de datos, pero neste caso aplícanse técnicas máis avanzadas e incipientes de reducción dimensional e visualización, conseguindo que os resultados se melloren notablemente. O último dos traballos aproveita o coñecemento dos dous traballos anteriores, e aplícase á detección de intrusos en sistemas informáticos en datos da rede, nos que se producen ataques de diversa índole durante os procesos normais de funcionamento da rede

    ECHO Information sharing models

    Get PDF
    As part of the ECHO project, the Early Warning System (EWS) is one of four technologies under development. The E-EWS will provide the capability to share information to provide up to date information to all constituents involved in the E-EWS. The development of the E-EWS will be rooted in a comprehensive review of information sharing and trust models from within the cyber domain as well as models from other domains

    Fast Detection of Zero-Day Phishing Websites Using Machine Learning

    Get PDF
    The recent global growth in the number of internet users and online applications has led to a massive volume of personal data transactions taking place over the internet. In order to gain access to the valuable data and services involved for undertaking various malicious activities, attackers lure users to phishing websites that steal user credentials and other personal data required to impersonate their victims. Sophisticated phishing toolkits and flux networks are increasingly being used by attackers to create and host phishing websites, respectively, in order to increase the number of phishing attacks and evade detection. This has resulted in an increase in the number of new (zero-day) phishing websites. Anti-malware software and web browsers’ anti-phishing filters are widely used to detect the phishing websites thus preventing users from falling victim to phishing. However, these solutions mostly rely on blacklists of known phishing websites. In these techniques, the time lag between creation of a new phishing website and reporting it as malicious leaves a window during which users are exposed to the zero-day phishing websites. This has contributed to a global increase in the number of successful phishing attacks in recent years. To address the shortcoming, this research proposes three Machine Learning (ML)-based approaches for fast and highly accurate prediction of zero-day phishing websites using novel sets of prediction features. The first approach uses a novel set of 26 features based on URL structure, and webpage structure and contents to predict zero-day phishing webpages that collect users’ personal data. The other two approaches detect zero-day phishing webpages, through their hostnames, that are hosted in Fast Flux Service Networks (FFSNs) and Name Server IP Flux Networks (NSIFNs). The networks consist of frequently changing machines hosting malicious websites and their authoritative name servers respectively. The machines provide a layer of protection to the actual service hosts against blacklisting in order to prolong the active life span of the services. Consequently, the websites in these networks become more harmful than those hosted in normal networks. Aiming to address them, our second proposed approach predicts zero-day phishing hostnames hosted in FFSNs using a novel set of 56 features based on DNS, network and host characteristics of the hosting networks. Our last approach predicts zero-day phishing hostnames hosted in NSIFNs using a novel set of 11 features based on DNS and host characteristics of the hosting networks. The feature set in each approach is evaluated using 11 ML algorithms, achieving a high prediction performance with most of the algorithms. This indicates the relevance and robustness of the feature sets for their respective detection tasks. The feature sets also perform well against data collected over a later time period without retraining the data, indicating their long-term effectiveness in detecting the websites. The approaches use highly diversified feature sets which is expected to enhance the resistance to various detection evasion tactics. The measured prediction times of the first and the third approaches are sufficiently low for potential use for real-time protection of users. This thesis also introduces a multi-class classification technique for evaluating the feature sets in the second and third approaches. The technique predicts each of the hostname types as an independent outcome thus enabling experts to use type-specific measures in taking down the phishing websites. Lastly, highly accurate methods for labelling hostnames based on number of changes of IP addresses of authoritative name servers, monitored over a specific period of time, are proposed

    On the subspace learning for network attack detection

    Get PDF
    Tese (doutorado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2019.O custo com todos os tipos de ciberataques tem crescido nas organizações. A casa branca do goveno norte americano estima que atividades cibernéticas maliciosas custaram em 2016 um valor entre US57bilho~eseUS57 bilhões e US109 bilhões para a economia norte americana. Recentemente, é possível observar um crescimento no número de ataques de negação de serviço, botnets, invasões e ransomware. A Accenture argumenta que 89% dos entrevistados em uma pesquisa acreditam que tecnologias como inteligência artificial, aprendizagem de máquina e análise baseada em comportamentos, são essenciais para a segurança das organizações. É possível adotar abordagens semisupervisionada e não-supervisionadas para implementar análises baseadas em comportamentos, que podem ser aplicadas na detecção de anomalias em tráfego de rede, sem a ncessidade de dados de ataques para treinamento. Esquemas de processamento de sinais têm sido aplicados na detecção de tráfegos maliciosos em redes de computadores, através de abordagens não-supervisionadas que mostram ganhos na detecção de ataques de rede e na detecção e anomalias. A detecção de anomalias pode ser desafiadora em cenários de dados desbalanceados, que são casos com raras ocorrências de anomalias em comparação com o número de eventos normais. O desbalanceamento entre classes pode comprometer o desempenho de algoritmos traficionais de classificação, através de um viés para a classe predominante, motivando o desenvolvimento de algoritmos para detecção de anomalias em dados desbalanceados. Alguns algoritmos amplamente utilizados na detecção de anomalias assumem que observações legítimas seguem uma distribuição Gaussiana. Entretanto, esta suposição pode não ser observada na análise de tráfego de rede, que tem suas variáveis usualmente caracterizadas por distribuições assimétricas ou de cauda pesada. Desta forma, algoritmos de detecção de anomalias têm atraído pesquisas para se tornarem mais discriminativos em distribuições assimétricas, como também para se tornarem mais robustos à corrupção e capazes de lidar com problemas causados pelo desbalanceamento de dados. Como uma primeira contribuição, foi proposta a Autosimilaridade (Eigensimilarity em inglês), que é uma abordagem baseada em conceitos de processamento de sinais com o objetivo de detectar tráfego malicioso em redes de computadores. Foi avaliada a acurácia e o desempenho da abordagem proposta através de cenários simulados e dos dados do DARPA 1998. Os experimentos mostram que Autosimilaridade detecta os ataques synflood, fraggle e varredura de portas com precisão, com detalhes e de uma forma automática e cega, i.e. em uma abordagem não-supervisionada. Considerando que a assimetria de distribuições de dados podem melhorar a detecção de anomalias em dados desbalanceados e assimétricos, como no caso de tráfego de rede, foi proposta a Análise Robusta de Componentes Principais baseada em Momentos (ARCP-m), que é uma abordagem baseada em distâncias entre observações contaminadas e momentos calculados a partir subespaços robustos aprendidos através da Análise Robusta de Componentes Principais (ARCP), com o objetivo de detectar anomalias em dados assimétricos e em tráfego de rede. Foi avaliada a acurácia do ARCP-m para detecção de anomalias em dados simulados, com distribuições assimétricas e de cauda pesada, como também para os dados do CTU-13. Os experimentos comparam nossa proposta com algoritmos amplamente utilizados para detecção de anomalias e mostra que a distância entre estimativas robustas e observações contaminadas pode melhorar a detecção de anomalias em dados assimétricos e a detecção de ataques de rede. Adicionalmente, foi proposta uma arquitetura e abordagem para avaliar uma prova de conceito da Autosimilaridade para a detecção de comportamentos maliciosos em aplicações móveis corporativas. Neste sentido, foram propostos cenários, variáveis e abordagem para a análise de ameaças, como também foi avaliado o tempo de processamento necessário para a execução do Autosimilaridade em dispositivos móveis.The cost of all types of cyberattacks is increasing for global organizations. The Whitehouse of the U.S. government estimates that malicious cyber activity cost the U.S. economy between US57billionandUS57 billion and US109 billion in 2016. Recently, it is possible to observe an increasing in numbers of Denial of Service (DoS), botnets, malicious insider and ransomware attacks. Accenture consulting argues that 89% of survey respondents believe breakthrough technologies, like artificial intelligence, machine learning and user behavior analytics, are essential for securing their organizations. To face adversarial models, novel network attacks and counter measures of attackers to avoid detection, it is possible to adopt unsupervised or semi-supervised approaches for network anomaly detection, by means of behavioral analysis, where known anomalies are not necessaries for training models. Signal processing schemes have been applied to detect malicious traffic in computer networks through unsupervised approaches, showing advances in network traffic analysis, in network attack detection, and in network intrusion detection systems. Anomalies can be hard to identify and separate from normal data due to the rare occurrences of anomalies in comparison to normal events. The imbalanced data can compromise the performance of most standard learning algorithms, creating bias or unfair weight to learn from the majority class and reducing detection capacity of anomalies that are characterized by the minority class. Therefore, anomaly detection algorithms have to be highly discriminating, robust to corruption and able to deal with the imbalanced data problem. Some widely adopted algorithms for anomaly detection assume a Gaussian distributed data for legitimate observations, however this assumption may not be observed in network traffic, which is usually characterized by skewed and heavy-tailed distributions. As a first important contribution, we propose the Eigensimilarity, which is an approach based on signal processing concepts applied to detection of malicious traffic in computer networks. We evaluate the accuracy and performance of the proposed framework applied to a simulated scenario and to the DARPA 1998 data set. The performed experiments show that synflood, fraggle and port scan attacks can be detected accurately by Eigensimilarity and with great detail, in an automatic and blind fashion, i.e. in an unsupervised approach. Considering that the skewness improves anomaly detection in imbalanced and skewed data, such as network traffic, we propose the Moment-based Robust Principal Component Analysis (mRPCA) for network attack detection. The m-RPCA is a framework based on distances between contaminated observations and moments computed from a robust subspace learned by Robust Principal Component Analysis (RPCA), in order to detect anomalies from skewed data and network traffic. We evaluate the accuracy of the m-RPCA for anomaly detection on simulated data sets, with skewed and heavy-tailed distributions, and for the CTU-13 data set. The Experimental evaluation compares our proposal to widely adopted algorithms for anomaly detection and shows that the distance between robust estimates and contaminated observations can improve the anomaly detection on skewed data and the network attack detection. Moreover, we propose an architecture and approach to evaluate a proof of concept of Eigensimilarity for malicious behavior detection on mobile applications, in order to detect possible threats in offline corporate mobile client. We propose scenarios, features and approaches for threat analysis by means of Eigensimilarity, and evaluate the processing time required for Eigensimilarity execution in mobile devices

    Computational Methods for Medical and Cyber Security

    Get PDF
    Over the past decade, computational methods, including machine learning (ML) and deep learning (DL), have been exponentially growing in their development of solutions in various domains, especially medicine, cybersecurity, finance, and education. While these applications of machine learning algorithms have been proven beneficial in various fields, many shortcomings have also been highlighted, such as the lack of benchmark datasets, the inability to learn from small datasets, the cost of architecture, adversarial attacks, and imbalanced datasets. On the other hand, new and emerging algorithms, such as deep learning, one-shot learning, continuous learning, and generative adversarial networks, have successfully solved various tasks in these fields. Therefore, applying these new methods to life-critical missions is crucial, as is measuring these less-traditional algorithms' success when used in these fields

    Computer-Mediated Communication

    Get PDF
    This book is an anthology of present research trends in Computer-mediated Communications (CMC) from the point of view of different application scenarios. Four different scenarios are considered: telecommunication networks, smart health, education, and human-computer interaction. The possibilities of interaction introduced by CMC provide a powerful environment for collaborative human-to-human, computer-mediated interaction across the globe

    Cybersecurity

    Get PDF
    The Internet has given rise to new opportunities for the public sector to improve efficiency and better serve constituents in the form of e-government. But with a rapidly growing user base globally and an increasing reliance on the Internet, digital tools are also exposing the public sector to new risks. An accessible primer, Cybersecurity: Public Sector Threats and Responses focuses on the convergence of globalization, connectivity, and the migration of public sector functions online. It identifies the challenges you need to be aware of and examines emerging trends and strategies from around the world. Offering practical guidance for addressing contemporary risks, the book is organized into three sections: Global Trends—considers international e-government trends, includes case studies of common cyber threats and presents efforts of the premier global institution in the field National and Local Policy Approaches—examines the current policy environment in the United States and Europe and illustrates challenges at all levels of government Practical Considerations—explains how to prepare for cyber attacks, including an overview of relevant U.S. Federal cyber incident response policies, an organizational framework for assessing risk, and emerging trends Also suitable for classroom use, this book will help you understand the threats facing your organization and the issues to consider when thinking about cybersecurity from a policy perspective

    Data ethics : building trust : how digital technologies can serve humanity

    Get PDF
    Data is the magic word of the 21st century. As oil in the 20th century and electricity in the 19th century: For citizens, data means support in daily life in almost all activities, from watch to laptop, from kitchen to car, from mobile phone to politics. For business and politics, data means power, dominance, winning the race. Data can be used for good and bad, for services and hacking, for medicine and arms race. How can we build trust in this complex and ambiguous data world? How can digital technologies serve humanity? The 45 articles in this book represent a broad range of ethical reflections and recommendations in eight sections: a) Values, Trust and Law, b) AI, Robots and Humans, c) Health and Neuroscience, d) Religions for Digital Justice, e) Farming, Business, Finance, f) Security, War, Peace, g) Data Governance, Geopolitics, h) Media, Education, Communication. The authors and institutions come from all continents. The book serves as reading material for teachers, students, policy makers, politicians, business, hospitals, NGOs and religious organisations alike. It is an invitation for dialogue, debate and building trust! The book is a continuation of the volume “Cyber Ethics 4.0” published in 2018 by the same editors
    corecore