640 research outputs found

    Malware Classification Based on Multilayer Perception and Word2Vec for IoT Security

    Get PDF
    With the construction of smart cities, the number of Internet of Things (IoT) devices is growing rapidly, leading to an explosive growth of malware designed for IoT devices. These malware pose a serious threat to the security of IoT devices. The traditional malware classification methods mainly rely on feature engineering. To improve accuracy, a large number of different types of features will be extracted from malware files in these methods. That brings a high complexity to the classification. To solve these issues, a malware classification method based on Word2Vec and Multilayer Perception (MLP) is proposed in this article. First, for one malware sample, Word2Vec is used to calculate a word vector for all bytes of the binary file and all instructions in the assembly file. Second, we combine these vectors into a 256x256x2-dimensional matrix. Finally, we designed a deep learning network structure based on MLP to train the model. Then the model is used to classify the testing samples. The experimental results prove that the method has a high accuracy of 99.54%.This work was supported in part by the Key-Area Research and Development Program of Guangdong Province (2019B010136001), the Basic and Applied Basic Research Major Program for Guangdong Province (2019B030302002), and the Science and Technology Planning Project of Guangdong Province (LZC0023 and LZC0024). Authors’ addresses: Y. Qiao, Cyberspace Security Research Center, Peng Cheng Laboratory, No. 2 Xingke 1st Street, Shen-zhen, China, 518000; email: [email protected]; W. Zhang, School of Computer Science and Technology, Harbin Institute of Technology, No. 92, Xidazhi Street, Nangang District, Harbin, China, 150001, Cyberspace Security Research Center, Peng Cheng Laboratory, No. 2 Xingke 1st Street, Nanshan District, Shenzhen, China, 518000; email: [email protected]; X. Du, Department of Computer and Information Sciences, Temple University, 1801 N. Broad Street, Philadelphia, USA, PA 19122; email: [email protected]; M. Guizani, Department of Compute Science and Engineering, Qatar University, University Street, Doha, Qatar; email: [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2021 Association for Computing Machinery. 1533-5399/2021/09-ART10 $15.00 https://doi.org/10.1145/343675

    Um método supervisionado para encontrar variáveis discriminantes na análise de problemas complexos : estudos de caso em segurança do Android e em atribuição de impressora fonte

    Get PDF
    Orientadores: Ricardo Dahab, Anderson de Rezende RochaDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A solução de problemas onde muitos componentes atuam e interagem simultaneamente requer modelos de representação nem sempre tratáveis pelos métodos analíticos tradicionais. Embora em muitos caso se possa prever o resultado com excelente precisão através de algoritmos de aprendizagem de máquina, a interpretação do fenómeno requer o entendimento de quais são e em que proporção atuam as variáveis mais importantes do processo. Esta dissertação apresenta a aplicação de um método onde as variáveis discriminantes são identificadas através de um processo iterativo de ranqueamento ("ranking") por eliminação das que menos contribuem para o resultado, avaliando-se em cada etapa o impacto da redução de características nas métricas de acerto. O algoritmo de florestas de decisão ("Random Forest") é utilizado para a classificação e sua propriedade de importância das características ("Feature Importance") para o ranqueamento. Para a validação do método, dois trabalhos abordando sistemas complexos de natureza diferente foram realizados dando origem aos artigos aqui apresentados. O primeiro versa sobre a análise das relações entre programas maliciosos ("malware") e os recursos requisitados pelos mesmos dentro de um ecossistema de aplicações no sistema operacional Android. Para realizar esse estudo, foram capturados dados, estruturados segundo uma ontologia definida no próprio artigo (OntoPermEco), de 4.570 aplicações (2.150 malware, 2.420 benignas). O modelo complexo produziu um grafo com cerca de 55.000 nós e 120.000 arestas, o qual foi transformado usando-se a técnica de bolsa de grafos ("Bag Of Graphs") em vetores de características de cada aplicação com 8.950 elementos. Utilizando-se apenas os dados do manifesto atingiu-se com esse modelo 88% de acurácia e 91% de precisão na previsão do comportamento malicioso ou não de uma aplicação, e o método proposto foi capaz de identificar 24 características relevantes na classificação e identificação de famílias de malwares, correspondendo a 70 nós no grafo do ecosistema. O segundo artigo versa sobre a identificação de regiões em um documento impresso que contém informações relevantes na atribuição da impressora laser que o imprimiu. O método de identificação de variáveis discriminantes foi aplicado sobre vetores obtidos a partir do uso do descritor de texturas (CTGF-"Convolutional Texture Gradient Filter") sobre a imagem scaneada em 600 DPI de 1.200 documentos impressos em 10 impressoras. A acurácia e precisão médias obtidas no processo de atribuição foram de 95,6% e 93,9% respectivamente. Após a atribuição da impressora origem a cada documento, 8 das 10 impressoras permitiram a identificação de variáveis discriminantes associadas univocamente a cada uma delas, podendo-se então visualizar na imagem do documento as regiões de interesse para uma análise pericial. Os objetivos propostos foram atingidos mostrando-se a eficácia do método proposto na análise de dois problemas em áreas diferentes (segurança de aplicações e forense digital) com modelos complexos e estruturas de representação bastante diferentes, obtendo-se um modelo reduzido interpretável para ambas as situaçõesAbstract: Solving a problem where many components interact and affect results simultaneously requires models which sometimes are not treatable by traditional analytic methods. Although in many cases the result is predicted with excellent accuracy through machine learning algorithms, the interpretation of the phenomenon requires the understanding of how the most relevant variables contribute to the results. This dissertation presents an applied method where the discriminant variables are identified through an iterative ranking process. In each iteration, a classifier is trained and validated discarding variables that least contribute to the result and evaluating in each stage the impact of this reduction in the classification metrics. Classification uses the Random Forest algorithm, and the discarding decision applies using its feature importance property. The method handled two works approaching complex systems of different nature giving rise to the articles presented here. The first article deals with the analysis of the relations between \textit{malware} and the operating system resources requested by them within an ecosystem of Android applications. Data structured according to an ontology defined in the article (OntoPermEco) were captured to carry out this study from 4,570 applications (2,150 malware, 2,420 benign). The complex model produced a graph of about 55,000 nodes and 120,000 edges, which was transformed using the Bag of Graphs technique into feature vectors of each application with 8,950 elements. The work accomplished 88% of accuracy and 91% of precision in predicting malicious behavior (or not) for an application using only the data available in the application¿s manifest, and the proposed method was able to identify 24 relevant features corresponding to only 70 nodes of the entire ecosystem graph. The second article is about to identify regions in a printed document that contains information relevant to the attribution of the laser printer that printed it. The discriminant variable determination method achieved average accuracy and precision of 95.6% and 93.9% respectively in the source printer attribution using a dataset of 1,200 documents printed on ten printers. Feature vectors were obtained from the scanned image at 600 DPI applying the texture descriptor Convolutional Texture Gradient Filter (CTGF). After the assignment of the source printer to each document, eight of the ten printers allowed the identification of discriminant variables univocally associated to each one of them, and it was possible to visualize in document's image the regions of interest for expert analysis. The work in both articles accomplished the objective of reducing a complex system into an interpretable streamlined model demonstrating the effectiveness of the proposed method in the analysis of two problems in different areas (application security and digital forensics) with complex models and entirely different representation structuresMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Dissection of Modern Malicious Software

    Get PDF
    The exponential growth of the number of malicious software samples, known by malware in the specialized literature, constitutes nowadays one of the major concerns of cyber-security professionals. The objectives of the creators of this type of malware are varied, and the means used to achieve them are getting increasingly sophisticated. The increase of the computation and storage resources, as well as the globalization have been contributing to this growth, and fueling an entire industry dedicated to developing, selling and improving systems or solutions for securing, recovering, mitigating and preventing malware related incidents. The success of these systems typically depends of detailed analysis, often performed by humans, of malware samples captured in the wild. This analysis includes the search for patterns or anomalous behaviors that may be used as signatures to identify or counter-attack these threats. This Master of Science (Ms.C.) dissertation addresses problems related with dissecting and analyzing malware. The main objectives of the underlying work were to study and understand the techniques used by this type of software nowadays, as well as the methods that are used by specialists on that analysis, so as to conduct a detailed investigation and produce structured documentation for at least one modern malware sample. The work was mostly focused in malware developed for the Operating Systems (OSs) of the Microsoft Windows family for desktops. After a brief study of the state of the art, the dissertation presents the classifications applied to malware, which can be found in the technical literature on the area, elaborated mainly by an industry community or seller of a security product. The structuring of the categories is nonetheless the result of an effort to unify or complete different classifications. The families of some of the most popular or detected malware samples are also presented herein, initially in a tabular form and, subsequently, via a genealogical tree, with some of the variants of each previously described family. This tree provides an interesting perspective over malware and is one of the contributions of this programme. Within the context of the description of functionalities and behavior of malware, some advanced techniques, with which modern specimens of this type of software are equipped to ease their propagation and execution, while hindering their detection, are then discussed with more detail. The discussion evolves to the presentation of the concepts related to the detection and defense against modern malware, along with a small introduction to the main subject of this work. The analysis and dissection of two samples of malware is then the subject of the final chapters of the dissertation. A basic static analysis is performed to the malware known as Stuxnet, while the Trojan Banker known as Tinba/zuzy is subdued to both basic and advanced dynamic analysis. The results of this part of the work emphasize difficulties associated with these tasks and the sophistication and dangerous level of samples under investigation.O crescimento exponencial do número de amostras de software malicioso, conhecido na gíria informática como malware, constitui atualmente uma das maiores preocupações dos profissionais de cibersegurança. São vários os objetivos dos criadores deste tipo de software e a forma cada vez mais sofisticada como os mesmos são alcançados. O aumento da computação e capacidade de armazenamento, bem como a globalização, têm contribuído para este crescimento, e têm alimentado toda uma indústria dedicada ao desenvolvimento, venda e melhoramento de sistemas ou soluções de segurança, recuperação, mitigação e prevenção de incidentes relacionados com malware. O sucesso destes sistemas depende normalmente da análise detalhada, feita muitas vezes por humanos, de peças de malware capturadas no seu ambiente de atuação. Esta análise compreende a procura de padrões ou de comportamentos anómalos que possam servir de assinatura para identificar ou contra-atacar essas ameaças. Esta dissertação aborda a problemática da análise e dissecação de malware. O trabalho que lhe está subjacente tinha como objetivos estudar e compreender as técnicas utilizadas por este tipo de software hoje em dia, bem como as que são utilizadas por especialistas nessa análise, de forma a conduzir uma investigação detalhada e a produzir documentação estruturada sobre pelo menos uma amostra de malware moderna. O trabalho focou-se, sobretudo, em malware desenvolvido para os sistemas operativos da família Microsoft Windows para computadores de secretária. Após um breve estudo ao estado da arte, a dissertação apresenta as classificações de malware encontradas na literatura técnica da especialidade, principalmente usada pela indústria, resultante de um esforço de unificação das mesmas. São também apresentadas algumas das famílias de malware mais detetadas da atualidade, inicialmente através de uma tabela e, posteriormente, através de uma árvore geneológica, com algumas das variantes de cada uma das famílias descritas previamente. Esta árvore fornece uma perspetiva interessante sobre malware e constitui uma das contribuições deste programa de mestrado. Ainda no âmbito da descrição de funcionalidades e comportamentos do malware, são expostas, com algum detalhe, algumas técnicas avançadas com as quais os programas maliciosos mais modernos são por vezes munidos com o intuito a facilitar a sua propagação e execução, dificultando a sua deteção. A descrição evolui para a apresentação dos conceitos adjacentes à deteção e combate ao malware moderno, assim como para uma pequena introdução ao tema principal deste trabalho. A análise e dissecação de duas amostras de malware moderno surgem nos capítulos finais da dissertação. Ao malware conhecido por Stuxnet é feita a análise básica estática, enquanto que ao Trojan Banker Tinba/zusy é feita e demonstrada a análise dinâmica básica e avançada. Os resultados desta parte são demonstrativos do grau de sofisticação e perigosidade destas amostras e das dificuldades associadas a estas tarefas

    Trustworthiness as a Limitation on Network Neutrality

    Get PDF
    The policy debate over how to govern access to broadband networks has largely ignored the objective of network trustworthiness-a set of properties (including security, survivability, and safety) that guarantee expected behavior. Instead, the terms of the network access debate have focused on whether imposing a nondiscrimination or network neutrality obligation on network providers is justified by the condition of competition among last-mile providers. Rules proposed by scholars and policymakers would allow network providers to deviate from network neutrality to protect network trustworthiness, but none of these proposals has explored the implications of such exceptions for either neutrality or trustworthiness. This Article examines the relationship between network trustworthiness and network neutrality and finds that providing a trustworthiness exception is a viable way to accommodate trustworthiness within a network neutrality rule. Network providers need leeway to block or degrade traffic within their own subnets, and trustworthiness exceptions can provide them with sufficient flexibility to do so. But, the Article argues, defining the scope of a trustworthiness exception is critically important to the network neutrality rule as a whole: an unduly narrow exception could thwart innovative network defenses, while a broad exception could allow trustworthiness to become a pretext that protects a wide range of discrimination that network neutrality advocates seek to prevent. Furthermore, monitoring network providers\u27 use of a trustworthiness exception is necessary to ensure that it remains an exception, rather than becoming a rule. The Article therefore proposes that network providers be required to disclose data regarding their use of a trustworthiness exception . It also offers a general structure for managing these disclosure

    Genomic stuff: Governing the (im)matter of life

    Get PDF
    Emphasizing the context of what has often been referred to as “scarce natural resources”, in particular forests, meadows, and fishing stocks, Elinor Ostrom’s important work Governing the commons (1990) presents an institutional framework for discussing the development and use of collective action with respect to environmental problems. In this article we discuss extensions of Ostrom’s approach to genes and genomes and explore its limits and usefulness. With the new genetics, we suggest, the biological gaze has not only been turned inward to the management and mining of the human body, also the very notion of the “biological” has been destabilized. This shift and destabilization, we argue, which is the result of human refashioning and appropriation of “life itself”, raises important questions about the relevance and applicability of Ostrom’s institutional framework in the context of what we call “genomic stuff”, genomic material, data, and information

    Reporting Identity: Social and Political Implications of Adding a MENA Category to the U.S. Census

    Get PDF
    The Census Bureau has been testing a new category called MENA for the 2020 census that would better describe the Middle Eastern and North African population in the United States, but in January of 2018, the agency announced that the category requires further research. In this work, I connect the development of a MENA identity category to historical events, sociological theory, current politics and public concerns related to the following questions: What are the social and political implications of including a MENA category on the U.S. census? What does the movement to add a MENA identifier to the census tell us about the conceptualization of Middle Eastern and North African identity in America? To respond to these questions, I conducted in-depth interviews to get the data that cannot be captured by quantitative research methods: how people of Middle Eastern and North African descent experience the task of navigating U.S. standardized race and ethnic categories. As a result of this study, I find that the MENA category is not only a case of correcting outdated racial classification, but also a reflection of social and political issues in our country today, including those of privacy, discrimination, and freedom of expression

    Unsupervised Intrusion Detection with Cross-Domain Artificial Intelligence Methods

    Get PDF
    Cybercrime is a major concern for corporations, business owners, governments and citizens, and it continues to grow in spite of increasing investments in security and fraud prevention. The main challenges in this research field are: being able to detect unknown attacks, and reducing the false positive ratio. The aim of this research work was to target both problems by leveraging four artificial intelligence techniques. The first technique is a novel unsupervised learning method based on skip-gram modeling. It was designed, developed and tested against a public dataset with popular intrusion patterns. A high accuracy and a low false positive rate were achieved without prior knowledge of attack patterns. The second technique is a novel unsupervised learning method based on topic modeling. It was applied to three related domains (network attacks, payments fraud, IoT malware traffic). A high accuracy was achieved in the three scenarios, even though the malicious activity significantly differs from one domain to the other. The third technique is a novel unsupervised learning method based on deep autoencoders, with feature selection performed by a supervised method, random forest. Obtained results showed that this technique can outperform other similar techniques. The fourth technique is based on an MLP neural network, and is applied to alert reduction in fraud prevention. This method automates manual reviews previously done by human experts, without significantly impacting accuracy

    Cyber Law and Espionage Law as Communicating Vessels

    Get PDF
    Professor Lubin\u27s contribution is Cyber Law and Espionage Law as Communicating Vessels, pp. 203-225. Existing legal literature would have us assume that espionage operations and “below-the-threshold” cyber operations are doctrinally distinct. Whereas one is subject to the scant, amorphous, and under-developed legal framework of espionage law, the other is subject to an emerging, ever-evolving body of legal rules, known cumulatively as cyber law. This dichotomy, however, is erroneous and misleading. In practice, espionage and cyber law function as communicating vessels, and so are better conceived as two elements of a complex system, Information Warfare (IW). This paper therefore first draws attention to the similarities between the practices – the fact that the actors, technologies, and targets are interchangeable, as are the knee-jerk legal reactions of the international community. In light of the convergence between peacetime Low-Intensity Cyber Operations (LICOs) and peacetime Espionage Operations (EOs) the two should be subjected to a single regulatory framework, one which recognizes the role intelligence plays in our public world order and which adopts a contextual and consequential method of inquiry. The paper proceeds in the following order: Part 2 provides a descriptive account of the unique symbiotic relationship between espionage and cyber law, and further explains the reasons for this dynamic. Part 3 places the discussion surrounding this relationship within the broader discourse on IW, making the claim that the convergence between EOs and LICOs, as described in Part 2, could further be explained by an even larger convergence across all the various elements of the informational environment. Parts 2 and 3 then serve as the backdrop for Part 4, which details the attempt of the drafters of the Tallinn Manual 2.0 to compartmentalize espionage law and cyber law, and the deficits of their approach. The paper concludes by proposing an alternative holistic understanding of espionage law, grounded in general principles of law, which is more practically transferable to the cyber realmhttps://www.repository.law.indiana.edu/facbooks/1220/thumbnail.jp
    corecore