167 research outputs found
Image malware detection using deep learning
We are currently living in an area where artificial intelligence is making out every day to day life much easier to manage. Some researchers are continuously developing the codes of artificial intelligence to utilize the benefits of the human being. And there is the process called data mining, which is used in many domains, including finance, engineering, biomedicine, and cyber security. The utilization of data mining, artificial intelligence algorithms like deep learning is so vast that we can't even name them all. This technology has almost touched every industry and cyber security is the most beneficial. The process of enhancing cyber security with the help of deep learning methods has come out of the theory books and many organizations are utilizing them rather than using a traditional piece of software to defend against online threats. Especially in the field of recognizing and classifying codes or malware. And this is essential, because, with the advent of cloud computing and the Internet of Things, expand potential malware infection sites from PCs to any electronic device. This makes our day to day life very unsafe. In this post, first, we will describe in brief how deep learning can be the most useful and promising techniques to detect malware. Besides this we will go through a deep neural network,ResNet for malware dynamic behavior classification jobs
Machine Learning and other Computational-Intelligence Techniques for Security Applications
L'abstract è presente nell'allegato / the abstract is in the attachmen
Aprimorando a segurança do Android atravĂ©s de detecção de malware e geração automática de polĂticas
Orientadores: Paulo LĂcio de Geus, AndrĂ© Ricardo Abed GrĂ©gioTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Dispositivos mĂłveis tĂŞm evoluĂdo constantemente, recebendo novas funcionalidades e se tornando cada vez mais ubĂquos. Assim, eles se tornaram alvos lucrativos para criminosos. Como Android Ă© a plataforma lĂder em dispositivos mĂłveis, ele se tornou o alvo principal de desenvolvedores de malware. AlĂ©m disso, a quantidade de apps maliciosas encontradas por empresas de segurança que tĂŞm esse sistema operacional como alvo cresceu rapidamente nos Ăşltimos anos. Esta tese aborda o problema da segurança de tais dispositivos por dois lados: (i) analisando e identificando apps maliciosas e (ii) desenvolvendo uma polĂtica de segurança que pode restringir a superfĂcie de ataque disponĂvel para cĂłdigo nativo. Para tanto, foi desenvolvido um sistema para analisar apps dinamicamente, monitorando chamadas de API e chamadas de sistema. Destes traços de comportamento extraiu-se atributos, que sĂŁo utilizados por um algoritmo de aprendizado de máquina para classificar apps como maliciosas ou benignas. Um dos problemas principais de sistemas de análise dinâmica Ă© que eles possuem muitas diferenças em relação a dispositivos reais, e exemplares de malware podem usar essas caracterĂsticas para identificar se estĂŁo sendo analisados, impedindo assim que as ações maliciosas sejam observadas. Para identificar apps maliciosas de Android que evadem análises, desenvolveu-se uma tĂ©cnica que compara o comportamento de uma app em um dispositivo real e em um emulador. Identificou-se as ações que foram executadas apenas no sistema real e se a divergĂŞncia foi causada por caminhos de cĂłdigo diferentes serem explorados ou por algum erro nĂŁo relacionado. Por fim, realizou-se uma análise em larga escala de apps que utilizam cĂłdigo nativo, a fim de se identificar como este Ă© usado por apps legĂtimas e tambĂ©m para se criar uma polĂtica de segurança que restrinja as ações de malware que usam este tipo de cĂłdigoAbstract: Mobile devices have been constantly evolving, receiving new functionalities and becoming increasingly ubiquitous. Thus, they became lucrative targets for miscreants. Since Android is the leading platform for mobile devices, it became the most popular choice for malware developers. Moreover, the amount of malicious apps, found by security companies, that target this platform rapidly increased in the last few years. This thesis approaches the security problem of such devices in two ways: (i) by analyzing and identifying malicious apps, and (ii) by developing a sandboxing policy that can restrict the attack surface available to native code. A system was developed to dynamically analyze apps, monitoring API calls and system calls. From these behavior traces attributes were extracted, which are used by a machine learning algorithm to classify apps as malicious or benign. One of the main problems of dynamic analysis systems is that they have many differences compared to real devices, and malware can leverage these characteristics to identify whether they are being analyzed or not, thus being able to prevent the malicious actions from being observed. To identify Android malware that evades analyses, a technique was developed to compare the behavior of an app on a real device and on an emulator. Actions that were only executed in the bare metal system were identified, recognizing whether the divergence was caused by different code paths being explored or by some unrelated error. Finally, a large-scale analysis of apps that use native code was performed, in order to identify how native code is used by benign apps and also to generate a sandboxing policy to restrict malware that use such codeDoutoradoCiĂŞncia da ComputaçãoDoutor em CiĂŞncia da Computação23038.007604/2014-69, 12269/13-1CAPE
Cyber Security and Critical Infrastructures
This book contains the manuscripts that were accepted for publication in the MDPI Special Topic "Cyber Security and Critical Infrastructure" after a rigorous peer-review process. Authors from academia, government and industry contributed their innovative solutions, consistent with the interdisciplinary nature of cybersecurity. The book contains 16 articles: an editorial explaining current challenges, innovative solutions, real-world experiences including critical infrastructure, 15 original papers that present state-of-the-art innovative solutions to attacks on critical systems, and a review of cloud, edge computing, and fog's security and privacy issues
Effiziente und erklärbare Erkennung von mobiler Schadsoftware mittels maschineller Lernmethoden
In recent years, mobile devices shipped with Google’s Android operating system
have become ubiquitous. Due to their popularity and the high concentration of
sensitive user data on these devices, however, they have also become a
profitable target of malware authors. As a result, thousands of new malware
instances targeting Android are found almost every day. Unfortunately, common
signature-based methods often fail to detect these applications, as these
methods can- not keep pace with the rapid development of new malware.
Consequently, there is an urgent need for new malware detection methods to
tackle this growing threat.
In this thesis, we address the problem by combining concepts of static analysis
and machine learning, such that mobile malware can be detected directly on the
mobile device with low run-time overhead. To this end, we first discuss our
analysis results of a sophisticated malware that uses an ultrasonic side
channel to spy on unwitting smartphone users. Based on the insights we gain
throughout this thesis, we gradually develop a method that allows detecting
Android malware in general. The resulting method performs a broad static
analysis, gathering a large number of features associated with an application.
These features are embedded in a joint vector space, where typical patterns
indicative of malware can be automatically identified and used for explaining
the decisions of our method. In addition to an evaluation of its overall
detection and run-time performance, we also examine the interpretability of the
underlying detection model and strengthen the classifier against realistic
evasion attacks.
In a large set of experiments, we show that the method clearly outperforms
several related approaches, including popular anti-virus scanners. In most
experiments, our approach detects more than 90% of all malicious samples in the
dataset at a low false positive rate of only 1%. Furthermore, even on older
devices, it offers a good run-time performance, and can output a decision along
with a proper explanation within a few seconds, despite the use of machine
learning techniques directly on the mobile device.
Overall, we find that the application of machine learning techniques is a
promising research direction to improve the security of mobile devices. While
these techniques alone cannot defeat the threat of mobile malware, they at
least raise the bar for malicious actors significantly, especially if combined
with existing techniques.Die Verbreitung von Smartphones, insbesondere mit dem Android-Betriebssystem,
hat in den vergangenen Jahren stark zugenommen. Aufgrund ihrer hohen
Popularität haben sich diese Geräte jedoch zugleich auch zu einem lukrativen
Ziel für Entwickler von Schadsoftware entwickelt, weshalb mittlerweile täglich
neue Schadprogramme fĂĽr Android gefunden werden.
Obwohl verschiedene Lösungen existieren, die Schadprogramme auch auf mobilen
Endgeräten identifizieren sollen, bieten diese in der Praxis häufig keinen
ausreichenden Schutz. Dies liegt vor allem daran, dass diese Verfahren zumeist
signaturbasiert arbeiten und somit schädliche Programme erst zuverlässig
identifizieren können, sobald entsprechende Erkennungssignaturen vorhanden
sind. Jedoch wird es fĂĽr Antiviren-Hersteller immer schwieriger, die zur
Erkennung notwendigen Signaturen rechtzeitig bereitzustellen. Daher ist die
Entwicklung von neuen Verfahren nötig, um der wachsenden Bedrohung durch mobile
Schadsoftware besser begegnen zu können.
In dieser Dissertation wird ein Verfahren vorgestellt und eingehend untersucht,
das Techniken der statischen Code-Analyse mit Methoden des maschinellen Lernens
kombiniert, um so eine zuverlässige Erkennung von mobiler Schadsoftware direkt
auf dem Mobilgerät zu ermöglichen. Die Methode analysiert hierfür mobile
Anwendungen zunächst statisch und extrahiert dabei spezielle Merkmale, die eine
Abbildung einer Applikation in einen hochdimensionalen Vektorraum ermöglichen.
In diesem Vektorraum sind schlieĂźlich maschinelle Lernmethoden in der Lage,
automatisch Muster zur Erkennung von Schadprogrammen zu finden. Die gefundenen
Muster können dabei nicht nur zur Erkennung, sondern darüber hinaus auch zur
Erklärung einer getroffenenen Entscheidung dienen.
Im Rahmen einer ausfĂĽhrlichen Evaluation wird nicht nur die Erkennungsleistung
und die Laufzeit der vorgestellten Methode untersucht, sondern darĂĽber hinaus
das gelernte Erkennungsmodell im Detail analysiert. Hierbei wird auch die
Robustheit des Modells gegenĂĽber gezielten Angriffe untersucht und verbessert.
In einer Reihe von Experimenten kann gezeigt werden, dass mit dem
vorgeschlagenen Verfahren bessere Ergebnisse erzielt werden können als mit
vergleichbaren Methoden, sogar einschließlich einiger populärer
Antivirenprogramme. In den meisten Experimenten kann die Methode Schadprogramme
zuverlässig erkennen und erreicht Erkennungsraten von über 90% bei einer
geringen Falsch-Positiv-Rate von 1%
Ensemble deep learning: A review
Ensemble learning combines several individual models to obtain better
generalization performance. Currently, deep learning models with multilayer
processing architecture is showing better performance as compared to the
shallow or traditional classification models. Deep ensemble learning models
combine the advantages of both the deep learning models as well as the ensemble
learning such that the final model has better generalization performance. This
paper reviews the state-of-art deep ensemble models and hence serves as an
extensive summary for the researchers. The ensemble models are broadly
categorised into ensemble models like bagging, boosting and stacking, negative
correlation based deep ensemble models, explicit/implicit ensembles,
homogeneous /heterogeneous ensemble, decision fusion strategies, unsupervised,
semi-supervised, reinforcement learning and online/incremental, multilabel
based deep ensemble models. Application of deep ensemble models in different
domains is also briefly discussed. Finally, we conclude this paper with some
future recommendations and research directions
Efficient, Scalable, and Accurate Program Fingerprinting in Binary Code
Why was this binary written? Which compiler was used? Which free software
packages did the developer use? Which sections of the code were borrowed? Who wrote
the binary? These questions are of paramount importance to security analysts and reverse
engineers, and binary fingerprinting approaches may provide valuable insights that can
help answer them. This thesis advances the state of the art by addressing some of the
most fundamental problems in program fingerprinting for binary code, notably, reusable
binary code discovery, fingerprinting free open source software packages, and authorship
attribution.
First, to tackle the problem of discovering reusable binary code, we employ a technique
for identifying reused functions by matching traces of a novel representation of binary
code known as the semantic integrated graph. This graph enhances the control flow
graph, the register flow graph, and the function call graph, key concepts from classical program analysis, and merges them with other structural information to create a joint data
structure. Second, we approach the problem of fingerprinting free open source software
(FOSS) packages by proposing a novel resilient and efficient system that incorporates
three components. The first extracts the syntactical features of functions by considering
opcode frequencies and performing a hidden Markov model statistical test. The second
applies a neighborhood hash graph kernel to random walks derived from control flow
graphs, with the goal of extracting the semantics of the functions. The third applies the
z-score to normalized instructions to extract the behavior of the instructions in a function.
Then, the components are integrated using a Bayesian network model which synthesizes
the results to determine the FOSS function, making it possible to detect user-related functions.
Third, with these elements now in place, we present a framework capable of decoupling
binary program functionality from the coding habits of authors. To capture coding habits,
the framework leverages a set of features that are based on collections of functionalityindependent
choices made by authors during coding. Finally, it is well known that techniques
such as refactoring and code transformations can significantly alter the structure
of code, even for simple programs. Applying such techniques or changing the compiler
and compilation settings can significantly affect the accuracy of available binary analysis
tools, which severely limits their practicability, especially when applied to malware. To
address these issues, we design a technique that extracts the semantics of binary code in terms of both data and control flow. The proposed technique allows more robust binary
analysis because the extracted semantics of the binary code is generally immune
from code transformation, refactoring, and varying the compilers or compilation settings.
Specifically, it employs data-flow analysis to extract the semantic flow of the registers as
well as the semantic components of the control flow graph, which are then synthesized
into a novel representation called the semantic flow graph (SFG).
We evaluate the framework on large-scale datasets extracted from selected open source
C++ projects on GitHub, Google Code Jam events, Planet Source Code contests, and students’
programming projects and found that it outperforms existing methods in several
respects. First, it is able to detect the reused functions. Second, it can identify FOSS
packages in real-world projects and reused binary functions with high precision. Third, it
decouples authorship from functionality so that it can be applied to real malware binaries
to automatically generate evidence of similar coding habits. Fourth, compared to existing
research contributions, it successfully attributes a larger number of authors with a significantly
higher accuracy. Finally, the new framework is more robust than previous methods
in the sense that there is no significant drop in accuracy when the code is subjected to
refactoring techniques, code transformation methods, and different compilers
- …