6 research outputs found

    A Survey and Evaluation of Android-Based Malware Evasion Techniques and Detection Frameworks

    Get PDF
    Android platform security is an active area of research where malware detection techniques continuously evolve to identify novel malware and improve the timely and accurate detection of existing malware. Adversaries are constantly in charge of employing innovative techniques to avoid or prolong malware detection effectively. Past studies have shown that malware detection systems are susceptible to evasion attacks where adversaries can successfully bypass the existing security defenses and deliver the malware to the target system without being detected. The evolution of escape-resistant systems is an open research problem. This paper presents a detailed taxonomy and evaluation of Android-based malware evasion techniques deployed to circumvent malware detection. The study characterizes such evasion techniques into two broad categories, polymorphism and metamorphism, and analyses techniques used for stealth malware detection based on the malware’s unique characteristics. Furthermore, the article also presents a qualitative and systematic comparison of evasion detection frameworks and their detection methodologies for Android-based malware. Finally, the survey discusses open-ended questions and potential future directions for continued research in mobile malware detection

    Rapid Permissions-Based Detection and Analysis of Mobile Malware Using Random Decision Forests

    No full text

    Feature Selection on Permissions, Intents and APIs for Android Malware Detection

    Get PDF
    Malicious applications pose an enormous security threat to mobile computing devices. Currently 85% of all smartphones run Android, Google’s open-source operating system, making that platform the primary threat vector for malware attacks. Android is a platform that hosts roughly 99% of known malware to date, and is the focus of most research efforts in mobile malware detection due to its open source nature. One of the main tools used in this effort is supervised machine learning. While a decade of work has made a lot of progress in detection accuracy, there is an obstacle that each stream of research is forced to overcome, feature selection, i.e., determining which attributes of Android are most effective as inputs into machine learning models. This dissertation aims to address that problem by providing the community with an exhaustive analysis of the three primary types of Android features used by researchers: Permissions, Intents and API Calls. The intent of the report is not to describe a best performing feature set or a best performing machine learning model, nor to explain why certain Permissions, Intents or API Calls get selected above others, but rather to provide a holistic methodology to help guide feature selection for Android malware detection. The experiments used eleven different feature selection techniques covering filter methods, wrapper methods and embedded methods. Each feature selection technique was applied to seven different datasets based on the seven combinations available of Permissions, Intents and API Calls. Each of those seven datasets are from a base set of 119k Android apps. All of the result sets were then validated against three different machine learning models, Random Forest, SVM and a Neural Net, to test applicability across algorithm type. The experiments show that using a combination of Permissions, Intents and API Calls produced higher accuracy than using any of those alone or in any other combination and that feature selection should be performed on the combined dataset, not by feature type and then combined. The data also shows that, in general, a feature set size of 200 or more attributes is required for optimal results. Finally, the feature selection methods Relief, Correlation-based Feature Selection (CFS) and Recursive Feature Elimination (RFE) using a Neural Net are not satisfactory approaches for Android malware detection work. Based on the proposed methodology and experiments, this research provided insights into feature selection – a significant but often overlooked issue in Android malware detection. We believe the results reported herein is an important step for effective feature evaluation and selection in assisting malware detection especially for datasets with a large number of features. The methodology also has the potential to be applied to similar malware detection tasks or even in broader domains such as pattern recognition

    Data Science for Software Maintenance

    Get PDF
    Maintaining and evolving modern software systems is a difficult task: their scope and complexity mean that seemingly inconsequential changes can have far-reaching consequences. Most software development companies attempt to reduce the number of faults introduced by adopting maintenance processes. These processes can be developed in various ways. In this thesis, we argue that data science techniques can be used to support process development. Specifically, we claim that robust development processes are necessary to minimize the number of faults introduced when evolving complex software systems. These processes should be based on empirical research findings. Data science techniques allow software engineering researchers to develop research insights that may be difficult or impossible to obtain with other research methodologies. These research insights support the creation of development processes. Thus, data science techniques support the creation of empirically-based development processes. We support this argument with three examples. First, we present insights into automated malicious Android application (app) detection. Many of the prior studies done on this topic used small corpora that may provide insufficient variety to create a robust app classifier. Currently, no empirically established guidelines for corpus size exist, meaning that previous studies have used anywhere from tens of apps to hundreds of thousands of apps to draw their conclusions. This variability makes it difficult to judge if the findings of any one study generalize. We attempted to establish such guidelines and found that 1,000 apps may be sufficient for studies that are concerned with what the majority of apps do, while more than a million apps may be required in studies that want to identify outliers. Moreover, many prior studies of malicious app detection used outdated malware corpora in their experiments that, combined with the rapid evolution of the Android API, may have influenced the accuracy of the studies. We investigated this problem by studying 1.3 million apps and showed that the evolution of the API does affect classifier accuracy, but not in the way we originally predicted. We also used our API usage data to identify the most infrequently used API methods. The use of data science techniques allowed us to study an order of magnitude more apps than previous work in the area; additionally, our insights into infrequently used methods illustrate how data science can be used to guide API deprecation. Second, we present insights into the costs and benefits of regression testing. Regression test suites grow over time, and while a comprehensive suite can detect faults that are introduced into the system, such a suite can be expensive to write, maintain, and execute. These costs may or may not be justified, depending on the number and severity of faults the suite can detect. By studying 61 projects that use Travis CI, a continuous integration system, we were able to characterize the cost/benefit tradeoff of their test suites. For example, we found that only 74% of non-flaky test failures are caused by defects in the system under test; the other 26% were caused by incorrect or obsolete tests and thus represent a maintenance cost rather than a benefit of the suite. Data about the costs and benefits of testing can help system maintainers understand whether their test suite is a good investment, shaping their subsequent maintenance decisions. The use of data science techniques allowed us to study a large number of projects, increasing the external generalizability of the study and making the insights gained more useful. Third, we present insights into the use of mutants to replace real faulty programs in testing research. Mutants are programs that contain deliberately injected faults, where the faults are generated by applying mutation operators. Applying an operator means making a small change to the program source code, such as replacing a constant with another constant. The use of mutants is appealing because large numbers of mutants can be automatically generated and used when known faults are unavailable or insufficient in number. However, prior to this work, there was little experimental evidence to support the use of mutants as a replacement for real faults. We studied this problem and found that, in general, mutants are an adequate substitute for faults when conducting testing research. That is, a test suite’s ability to detect mutants is correlated with its ability to detect real faults that developers have fixed, for both developer-written and automatically-generated test suites. However, we also found that additional mutation operators should be developed and some classes of faults cannot be generated via mutation. The use of data science techniques was an essential part of generating the set of real faults used in the study. Taken together, the results of these three studies provide evidence that data science techniques allow software engineering researchers to develop insights that are difficult or impossible to obtain using other research methodologie
    corecore