11 research outputs found
Malware Analysis and Detection with Explainable Machine Learning
Malware detection is one of the areas where machine learning is successfully employed due to its high discriminating power and the capability of identifying novel variants of malware samples. Typically, the problem formulation is strictly correlated to the use of a wide variety of features covering several characteristics of the entities to classify. Apparently, this practice allows achieving considerable detection performance. However, it hardly permits us to gain insights into the knowledge extracted by the learning algorithm, causing two main issues. First, detectors might learn spurious patterns; thus, undermining their effectiveness in real environments. Second, they might be particularly vulnerable to adversarial attacks; thus, weakening their security. These concerns give rise to the necessity to develop systems that are tailored to the specific peculiarities of the attacks to detect.
Within malware detection, Android ransomware represents a challenging yet illustrative domain for assessing the relevance of this issue. Ransomware represents a serious threat that acts by locking the compromised device or encrypting its data, then forcing the device owner to pay a ransom in order to restore the device functionality. Attackers typically develop such dangerous apps so that normally-legitimate components and functionalities perform malicious behaviour; thus, making them harder to be distinguished from genuine applications. In this sense, adopting a well-defined variety of features and relying on some kind of explanations about the logic behind such detectors could improve their design process since it could reveal truly characterising features; hence, guiding the human expert towards the understanding of the most relevant attack patterns.
Given this context, the goal of the thesis is to explore strategies that may improve the design process of malware detectors. In particular, the thesis proposes to evaluate and integrate approaches based on rising research on Explainable Machine Learning. To this end, the work follows two pathways. The first and main one focuses on identifying the main traits that result to be characterising and effective for Android ransomware detection. Then, explainability techniques are used to propose methods to assess the validity of the considered features. The second pathway broadens the view by exploring the relationship between explainable machine learning and adversarial attacks. In this regard, the contribution consists of pointing out metrics extracted from explainability techniques that can reveal models' robustness to adversarial attacks, together with an assessment of the practical feasibility for attackers to alter the features that affect models' output the most.
Ultimately, this work highlights the necessity to adopt a design process that is aware of the weaknesses and attacks against machine learning-based detectors, and proposes explainability techniques as one of the tools to counteract them
On the Effectiveness of System API-Related Information for Android Ransomware Detection
Ransomware constitutes a significant threat to the Android operating system.
It can either lock or encrypt the target devices, and victims are forced to pay
ransoms to restore their data. Hence, the prompt detection of such attacks has
a priority in comparison to other malicious threats. Previous works on Android
malware detection mainly focused on Machine Learning-oriented approaches that
were tailored to identifying malware families, without a clear focus on
ransomware. More specifically, such approaches resorted to complex information
types such as permissions, user-implemented API calls, and native calls.
However, this led to significant drawbacks concerning complexity, resilience
against obfuscation, and explainability. To overcome these issues, in this
paper, we propose and discuss learning-based detection strategies that rely on
System API information. These techniques leverage the fact that ransomware
attacks heavily resort to System API to perform their actions, and allow
distinguishing between generic malware, ransomware and goodware.
We tested three different ways of employing System API information, i.e.,
through packages, classes, and methods, and we compared their performances to
other, more complex state-of-the-art approaches. The attained results showed
that systems based on System API could detect ransomware and generic malware
with very good accuracy, comparable to systems that employed more complex
information. Moreover, the proposed systems could accurately detect novel
samples in the wild and showed resilience against static obfuscation attempts.
Finally, to guarantee early on-device detection, we developed and released on
the Android platform a complete ransomware and malware detector (R-PackDroid)
that employed one of the methodologies proposed in this paper
A random telegraph signal of Mittag-Leffler type
A general method is presented to explicitly compute autocovariance functions
for non-Poisson dichotomous noise based on renewal theory. The method is
specialized to a random telegraph signal of Mittag-Leffler type. Analytical
predictions are compared to Monte Carlo simulations. Non-Poisson dichotomous
noise is non-stationary and standard spectral methods fail to describe it
properly as they assume stationarity.Comment: 13 pages, 3 figures, submitted to PR
Dominating Clasp of the Financial Sector Revealed by Partial Correlation Analysis of the Stock Market
What are the dominant stocks which drive the correlations present among stocks traded in a stock market? Can a correlation analysis provide an answer to this question? In the past, correlation based networks have been proposed as a tool to uncover the underlying backbone of the market. Correlation based networks represent the stocks and their relationships, which are then investigated using different network theory methodologies. Here we introduce a new concept to tackle the above question—the partial correlation network. Partial correlation is a measure of how the correlation between two variables, e.g., stock returns, is affected by a third variable. By using it we define a proxy of stock influence, which is then used to construct partial correlation networks. The empirical part of this study is performed on a specific financial system, namely the set of 300 highly capitalized stocks traded at the New York Stock Exchange, in the time period 2001–2003. By constructing the partial correlation network, unlike the case of standard correlation based networks, we find that stocks belonging to the financial sector and, in particular, to the investment services sub-sector, are the most influential stocks affecting the correlation profile of the system. Using a moving window analysis, we find that the strong influence of the financial stocks is conserved across time for the investigated trading period. Our findings shed a new light on the underlying mechanisms and driving forces controlling the correlation profile observed in a financial market
Improving malware detection with explainable machine learning
Machine learning is used for addressing several detection and classification tasks in cybersecurity. Typically, detectors are modeled through complex learning algorithms that employ a wide variety of features, which range from low-level machine code to statistical measures. Although these models allow achieving considerable performances, gaining insights on the learned knowledge turns out to be a hard task. These insights would help to capture the essential malicious components of a modern attack, which is usually hidden and obfuscated under potentially-legitimate sequences of instructions. These challenges can be addressed by employing explainable machine learning. In particular, explanations can help human experts to develop novel approaches for the static and dynamic analysis of applications by focusing on the distinctive features that characterize malware. In this perspective, we focus on such challenges and the potential uses of explainability techniques in the context of Android ransomware, which represents a serious threat for mobile platforms. We present an approach that enables the identification of the most influential features and the analysis of ransomware. We point out how explanations can be used to answer different questions depending on specific aspects, such as the considered explanation baselines. Our results suggest that our proposal can help cyber threat intelligence teams in the early detection of new ransomware families and could be extended to other types of malware