916 research outputs found
A Multi-view Context-aware Approach to Android Malware Detection and Malicious Code Localization
Existing Android malware detection approaches use a variety of features such
as security sensitive APIs, system calls, control-flow structures and
information flows in conjunction with Machine Learning classifiers to achieve
accurate detection. Each of these feature sets provides a unique semantic
perspective (or view) of apps' behaviours with inherent strengths and
limitations. Meaning, some views are more amenable to detect certain attacks
but may not be suitable to characterise several other attacks. Most of the
existing malware detection approaches use only one (or a selected few) of the
aforementioned feature sets which prevent them from detecting a vast majority
of attacks. Addressing this limitation, we propose MKLDroid, a unified
framework that systematically integrates multiple views of apps for performing
comprehensive malware detection and malicious code localisation. The rationale
is that, while a malware app can disguise itself in some views, disguising in
every view while maintaining malicious intent will be much harder.
MKLDroid uses a graph kernel to capture structural and contextual information
from apps' dependency graphs and identify malice code patterns in each view.
Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted
combination of the views which yields the best detection accuracy. Besides
multi-view learning, MKLDroid's unique and salient trait is its ability to
locate fine-grained malice code portions in dependency graphs (e.g.,
methods/classes). Through our large-scale experiments on several datasets
(incl. wild apps), we demonstrate that MKLDroid outperforms three
state-of-the-art techniques consistently, in terms of accuracy while
maintaining comparable efficiency. In our malicious code localisation
experiments on a dataset of repackaged malware, MKLDroid was able to identify
all the malice classes with 94% average recall
Malware Classification based on Call Graph Clustering
Each day, anti-virus companies receive tens of thousands samples of
potentially harmful executables. Many of the malicious samples are variations
of previously encountered malware, created by their authors to evade
pattern-based detection. Dealing with these large amounts of data requires
robust, automatic detection approaches. This paper studies malware
classification based on call graph clustering. By representing malware samples
as call graphs, it is possible to abstract certain variations away, and enable
the detection of structural similarities between samples. The ability to
cluster similar samples together will make more generic detection techniques
possible, thereby targeting the commonalities of the samples within a cluster.
To compare call graphs mutually, we compute pairwise graph similarity scores
via graph matchings which approximately minimize the graph edit distance. Next,
to facilitate the discovery of similar malware samples, we employ several
clustering algorithms, including k-medoids and DBSCAN. Clustering experiments
are conducted on a collection of real malware samples, and the results are
evaluated against manual classifications provided by human malware analysts.
Experiments show that it is indeed possible to accurately detect malware
families via call graph clustering. We anticipate that in the future, call
graphs can be used to analyse the emergence of new malware families, and
ultimately to automate implementation of generic detection schemes.Comment: This research has been supported by TEKES - the Finnish Funding
Agency for Technology and Innovation as part of its ICT SHOK Future Internet
research programme, grant 40212/0
A Survey on Algorithmic Techniques for Malware Detection
Malware is a specific type of software intended to breed damages ranging from computer systems fallout to deprivation of data integrity and confidentiality. Recently, along with the high usage of distributed systems and the increasing speed in telecommunications, the early detection of malware constitutes one of the major concerns in information society. A strong advantage that malware employs in order to elude detection is the ability of polymorphism (metamorphic or polymorphic engines). In this work we present efficient algorithmic techniques that, leveraging higher level abstractions of malware structure, perform an isomorphism check in malware's produced graph structures, such as function call-graphs and control flow-graphs, in order to detect every possible polymorphic version of a malware. Moreover, we propose an algorithmic approach for malware detection which focuses on the use of behavioural graphs as a more flexible representation of malware's functionality with respect to its interaction with the operating system. The main idea of our approach is mainly based on behavioural graph similarity issues
apk2vec: Semi-supervised multi-view representation learning for profiling Android applications
Building behavior profiles of Android applications (apps) with holistic, rich
and multi-view information (e.g., incorporating several semantic views of an
app such as API sequences, system calls, etc.) would help catering downstream
analytics tasks such as app categorization, recommendation and malware analysis
significantly better. Towards this goal, we design a semi-supervised
Representation Learning (RL) framework named apk2vec to automatically generate
a compact representation (aka profile/embedding) for a given app. More
specifically, apk2vec has the three following unique characteristics which make
it an excellent choice for largescale app profiling: (1) it encompasses
information from multiple semantic views such as API sequences, permissions,
etc., (2) being a semi-supervised embedding technique, it can make use of
labels associated with apps (e.g., malware family or app category labels) to
build high quality app profiles, and (3) it combines RL and feature hashing
which allows it to efficiently build profiles of apps that stream over time
(i.e., online learning). The resulting semi-supervised multi-view hash
embeddings of apps could then be used for a wide variety of downstream tasks
such as the ones mentioned above. Our extensive evaluations with more than
42,000 apps demonstrate that apk2vec's app profiles could significantly
outperform state-of-the-art techniques in four app analytics tasks namely,
malware detection, familial clustering, app clone detection and app
recommendation.Comment: International Conference on Data Mining, 201
A fast and scalable binary similarity method for open source libraries
Abstract. Usage of third party open source software has become more and more popular in the past years, due to the need for faster development cycles and the availability of good quality libraries. Those libraries are integrated as dependencies and often in the form of binary artifacts. This is especially common in embedded software applications. Dependencies, however, can proliferate and also add new attack surfaces to an application due to vulnerabilities in the library code. Hence, the need for binary similarity analysis methods to detect libraries compiled into applications.
Binary similarity detection methods are related to text similarity methods and build upon the research in that area. In this research we focus on fuzzy matching methods, that have been used widely and successfully in text similarity analysis. In particular, we propose using locality sensitive hashing schemes in combination with normalised binary code features. The normalization allows us to apply the similarity comparison across binaries produced by different compilers using different optimization flags and being build for various machine architectures.
To improve the matching precision, we use weighted code features. Machine learning is used to optimize the feature weights to create clusters of semantically similar code blocks extracted from different binaries. The machine learning is performed in an offline process to increase scalability and performance of the matching system.
Using above methods we build a database of binary similarity code signatures for open source libraries. The database is utilized to match by similarity any code blocks from an application to known libraries in the database. One of the goals of our system is to facilitate a fast and scalable similarity matching process. This allows integrating the system into continuous software development, testing and integration pipelines.
The evaluation shows that our results are comparable to other systems proposed in related research in terms of precision while maintaining the performance required in continuous integration systems.Nopea ja skaalautuva käännettyjen ohjelmistojen samankaltaisuuden tunnistusmenetelmä avoimen lähdekoodin kirjastoille. Tiivistelmä. Kolmansien osapuolten kehittämien ohjelmistojen käyttö on yleistynyt valtavasti viime vuosien aikana nopeutuvan ohjelmistokehityksen ja laadukkaiden ohjelmistokirjastojen tarjonnan kasvun myötä. Nämä kirjastot ovat yleensä lisätty kehitettävään ohjelmistoon riippuvuuksina ja usein jopa käännettyinä binääreinä. Tämä on yleistä varsinkin sulatetuissa ohjelmistoissa. Riippuvuudet saattavat kuitenkin luoda uusia hyökkäysvektoreita kirjastoista löytyvien haavoittuvuuksien johdosta. Nämä kolmansien osapuolten kirjastoista löytyvät haavoittuvuudet synnyttävät tarpeen tunnistaa käännetyistä binääriohjelmistoista löytyvät avoimen lähdekoodin ohjelmistokirjastot.
Binäärien samankaltaisuuden tunnistusmenetelmät usein pohjautuvat tekstin samankaltaisuuden tunnistusmenetelmiin ja hyödyntävät tämän tieteellisiä saavutuksia. Tässä tutkimuksessa keskitytään sumeisiin tunnistusmenetelmiin, joita on käytetty laajasti tekstin samankaltaisuuden tunnistamisessa. Tutkimuksessa hyödynnetään sijainnille sensitiivisiä tiivistemenetelmiä ja normalisoituja binäärien ominaisuuksia. Ominaisuuksien normalisoinnin avulla binäärien samankaltaisuutta voidaan vertailla ohjelmiston kääntämisessä käytetystä kääntäjästä, optimisaatiotasoista ja prosessoriarkkitehtuurista huolimatta.
Menetelmän tarkkuutta parannetaan painotettujen binääriominaisuuksien avulla. Koneoppimista hyödyntämällä binääriomisaisuuksien painotus optimoidaan siten, että samankaltaisista binääreistä puretut ohjelmistoblokit luovat samankaltaisien ohjelmistojen joukkoja. Koneoppiminen suoritetaan erillisessä prosessissa, mikä parantaa järjestelmän suorituskykyä.
Näiden menetelmien avulla luodaan tietokanta avoimen lähdekoodin kirjastojen tunnisteista. Tietokannan avulla minkä tahansa ohjelmiston samankaltaiset binääriblokit voidaan yhdistää tunnettuihin avoimen lähdekoodin kirjastoihin. Menetelmän tavoitteena on tarjota nopea ja skaalautuva samankaltaisuuden tunnistus. Näiden ominaisuuksien johdosta järjestelmä voidaan liittää osaksi ohjelmistokehitys-, integraatioprosesseja ja ohjelmistotestausta.
Vertailu muihin kirjallisuudessa esiteltyihin menetelmiin osoittaa, että esitellyn menetlmän tulokset on vertailtavissa muihin kirjallisuudessa esiteltyihin menetelmiin tarkkuuden osalta. Menetelmä myös ylläpitää suorituskyvyn, jota vaaditaan jatkuvan integraation järjestelmissä
- …