6 research outputs found

    Effective methods to detect metamorphic malware: A systematic review

    Get PDF
    The succeeding code for metamorphic Malware is routinely rewritten to remain stealthy and undetected within infected environments. This characteristic is maintained by means of encryption and decryption methods, obfuscation through garbage code insertion, code transformation and registry modification which makes detection very challenging. The main objective of this study is to contribute an evidence-based narrative demonstrating the effectiveness of recent proposals. Sixteen primary studies were included in this analysis based on a pre-defined protocol. The majority of the reviewed detection methods used Opcode, Control Flow Graph (CFG) and API Call Graph. Key challenges facing the detection of metamorphic malware include code obfuscation, lack of dynamic capabilities to analyse code and application difficulty. Methods were further analysed on the basis of their approach, limitation, empirical evidence and key parameters such as dataset, Detection Rate (DR) and False Positive Rate (FPR)

    부분 그래프 동형 사상 문제를 위한 점진적 유전 알고리즘의 조사

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2019. 2. 문병로.그래프는 객체들의 관계를 표현하는 가장 대표적인 자료구조이고, 데이터가 그래프 형태로 표현되는 많은 연구분야에서 발생하는 핵심 문제들 중 하나가 바로 그래프 패턴 매칭이다. 그래프 패턴 매칭은 정점이나 간선들의 정보를 이용한 시멘틱 기반의 방법으로 정의할 수도 있지만, 일반적으로는 정점과 간선간의 관계만을 이용해 구조적으로 정의하고 이러한 패턴매칭은 부분그래프 동형사상으로 표현된다. 그동안 부분그래프 동형사상 문제를 풀기 위해 제안된 알고리즘들은 크게 두 가지 형태로 분류된다. 첫번째는 재귀적 퇴각검색 알고리즘을 기반으로 존재하는 모든 해를 정확하게 찾아내는 방법이다. 다만, 부분그래프 동형사상 문제는 대표적인 NP-완비군의 문제 중 하나이기 때문에 모든 순열을 하나씩 탐색하는 경우 수행시간이 문제의 크기에 따라 기하급수적으로 늘어나게 된다. 두번째는 유전 알고리즘을 비롯한 메타휴리스틱 알고리즘 기반의 근사적인 방법이다. 이들은 합리적인 시간 내에 좋은 품질의 해들을 찾아내지만 대부분의 알고리즘이 그 크고 복잡한 문제공간 전체를 다룰 수 있을 만큼의 탐색능력을 갖추지는 못하였다. 연산자나 지역 휴리스틱을 개선하여 알고리즘의 공간탐색능력을 직접적으로 향상시킬수도 있겠지만, 적합도 함수를 변경하거나 탐색전략 변경을 통해서도 크게 성능을 개선할 수 있다. 만약 원래 문제를 메타 휴리스틱의 탐색능력에 적합한 크기의 부분문제로 분할하고, 이 부분문제를 단계적으로 풀어간다면 보다 효율적으로 문제를 해결할 수 있다. 또한, 적합도 함수를 변경하여 공간을 보다 단순한 형태로 변환시킨다면 그 효과가 훨씬 더 커질것이다. 본 논문에서는 부분그래프 동형사상 문제가 이루는 문제공간의 특성을 분석하고 이에 어울리는 적합도 함수와 탐색전략을 바탕으로, 이 문제를 효율적으로 풀기 위한 유전알고리즘을 제안한다. 첫번째로, 부분그래프 동형사상 문제에 어울리는 새로운 적합도 함수를 소개하고, 연산자와 함께 생성되는 적합도 공간이 어떠한 형태로 변형되는지를 살펴본다. 우선, 기존 연구들에서 사용한 적합도 함수가 가지고 있던 문제점들을 검토하고 이를 해결하기 위해 부분그래프 동형사상의 정점의 차수 조건을 반영한 새로운 함수를 설계해서 기존의 함수와 결합한 다목적 적합도 함수를 제안한다. 이후, 실험을 통해서 지역 최적화 알고리즘과 결합했을 때 적합도 값들의 변화과정을 통해 새로운 적합도 함수의 특징들을 분석하 고 지역최적점들을 모아 적합도 함수와 해들의 평균거리를 이용한 상관관계를 통해 제안한 적합도 함수가 그리는 문제공간이 기존의 문제공간을 어떤 식으로 변형시키는지를 설명한다. 제안한 다목적 적합도 함수를 혼 합형 유전알고리즘에 적용한 결과를 기존 연구들의 결과들과 비교하여 제안한 다목적 적합도 함수가 유전알고리즘의 문제공간탐색과 최적화에 얼마나 도움을 주는지를 확인한다. 두번째로, 새롭게 설계된 문제공간을 효율적으로 탐색하기 위한 전략으로 점진적 유전 알고리즘을 소개하고 각 설계요소들이 알고리즘의 수행과정과 성능에 어떻게 반영되는지를 알아본다. 우선, 점진적 유전알고리즘에서 원 문제를 최적 부분구조를 갖는 일련의 연속적인 부분문제들로 분할한 후 각 부분문제를 혼합형 유전알고리즘을 통해 풀고 얻어진 해들을 확장하여 다음 부분문제의 초기해로 사용하는 방법을 설명하고, 이러한 과정을 순차적으로 적용하여 작은 부분문제의 해를 원래 문제의 해로 발전시켜 원 문제의 답을 얻는 과정을 보인다. 이후, 점진적 유전알고리즘을 진행하는 과정에서 원래의 문제를 분할하는 방법과 부분문제들의 연속성을 설정하는 부분이 알고리즘 전체 성능에 어느 정도 영향을 미치는지를실험을통해 분석한다.최종적으로 랜덤그래프에 대해서 제안한 점진적 혼합 유전 알고리즘의 성능과 기존의 혼합형 유전알고리즘의 성능을 비교 분석하고,기존의 알고리즘들로는 불가능했던 사이즈가 큰 실제 데이터들에 대해서도 좋은 성능을 보임으로써 확장성까지 갖춘 것을 보여준다.Graph is the most representative data structure for modeling the relationships of objects and graph pattern matching is one of the key problems that arise in many applications where data is expressed in the form of graph. Although graph pattern matching can be defined by a semantic-based method using information such as the labels of vertices or edges, it is generally defined by a structure-based method using only the relationships between vertices and edges, and such pattern matching is represented by the subgraph isomorphism. The algorithms proposed so far to solve the subgraph isomorphism problem are classified into two types. The first is an exact method to find out all existing solutions based on the recursive backtracking algorithm. However, since the subgraph isomorphism problem is NP-complete, if all the permutations are searched one by one, the running time increases exponentially according to the size of the problem. The second is an approximation method based on metaheuristic such as genetic algorithm. They are able to find good quality solutions within a reasonable amount of time, but most algorithms do not have enough search capability to cover the large and complex problem space of this problem. The search capability of a metaheuristic algorithm can be improved by designing better operators or local heuristics, but it is possible to improve the performance greatly by changing the fitness function and by reforming the search strategy. If the original problem is divided into subproblems with the suitable size for the search capability of a metaheuristic algorithm, and each subproblem is solved step by step, the problem can be solved more efficiently. Also, if we change the fitness function to transform the fitness landscape more convex, the effect of an incremental algorithm will be much greater. In this thesis, we propose an efficient incremental hybrid genetic algorithm to solve the subgraph isomorphism problem. First, we introduce a new fitness function which is suitable for the problem of the subgraph isomorphism problem and examine how the fitness landscape generated with the operator is transformed. We introduce a multi-objective fitness function by designing a new function reflecting the degree constraint of the subgraph isomorphism. Through the experiments, we analyze the characteristics of the new fitness function combining with the local optimization algorithm, investigate the correlation between the fitness value and the average distance of the local optima to explain how the new fitness function transforms the fitness landscape of the subgraph isomorphism problem. We compare the results of the hybrid genetic algorithm applying the proposed multi-objective fitness function with that of the conventional genetic algorithm and show how the proposed fitness function facilitates the search capability of a genetic algorithm. Second, we introduce the new efficient search strategy, the incremental genetic algorithm, and how the design issues are reflected in the process and performance of the algorithm. We divide the original problem into a sequence of successive subproblems with the optimal substructure, solve each subproblem through the hybrid genetic algorithm, and then extend the solutions obtained for the initial solutions of the next subproblem. This process is applied sequentially to develop the solutions of the small problem to those of the original problem. Through the experiments, we discuss how to divide the original problem into successivee subproblems and analyze how components of the sequence affect the performance of the incremental genetic algorithm. We also compare the performance of the incremental hybrid genetic algorithm with that of the previous hybrid genetic algorithm through the random graph instances, and show a good scalability the proposed algorithm through the experimental results obtained for real data with a large size that was impossible with existing algorithms.I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 4 II. Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Graph Pattern Matching and Isomorphism . . . . . . . . . . 5 2.2 Subgraph Isomorphism and Related Problems . . . . . . . . 7 2.3 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Structure . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 Representation . . . . . . . . . . . . . . . . . . . . 12 2.3.3 Fitness Function . . . . . . . . . . . . . . . . . . . 13 2.3.4 Crossover . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.5 Mutation . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.6 Hybrid Genetic Algorithm . . . . . . . . . . . . . . 15 III. Inspecting Fitness Function of Subgraph Isomorphism Problem . . . . . . . . . . . . . . . . . . . . 16 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Conventional Fitness Function . . . . . . . . . . . . . . . . 18 3.3 Multi-objective Fitness Function . . . . . . . . . . . . . . . 19 3.4 Local Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 23 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5.1 Experimental Setting . . . . . . . . . . . . . . . . . 24 3.5.2 Comparison of Single and Multi-objective Function . 25 3.5.3 Global Convexity of the Multi-objective Fitness Landscape . . . . . . . . . . 27 3.5.4 Hybrid Genetic Algorithm . . . . . . . . . . . . . . 30 IV. Incremental Hybrid Genetic Algorithm . . . . . . . . . . . . 34 4.1 Incremental Process . . . . . . . . . . . . . . . . . . . . . . 34 4.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . 37 4.3 Design Schemes . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3.1 Vertex Reordering . . . . . . . . . . . . . . . . . . 40 4.3.2 Stopping Criterion . . . . . . . . . . . . . . . . . . 41 4.3.3 Expansion Size . . . . . . . . . . . . . . . . . . . . 42 4.4 Genetic Frameworks . . . . . . . . . . . . . . . . . . . . . 42 4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . 45 4.5.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . 45 4.5.2 Real World Data . . . . . . . . . . . . . . . . . . . 57 V. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 국문초록 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Docto

    Adaptive rule-based malware detection employing learning classifier systems

    Get PDF
    Efficient and accurate malware detection is increasingly becoming a necessity for society to operate. Existing malware detection systems have excellent performance in identifying known malware for which signatures are available, but poor performance in anomaly detection for zero day exploits for which signatures have not yet been made available or targeted attacks against a specific entity. The primary goal of this thesis is to provide evidence for the potential of learning classier systems to improve the accuracy of malware detection. A customized system based on a state-of-the-art learning classier system is presented for adaptive rule-based malware detection, which combines a rule-based expert system with evolutionary algorithm based reinforcement learning, thus creating a self-training adaptive malware detection system which dynamically evolves detection rules. This system is analyzed on a benchmark of malicious and non-malicious files. Experimental results show that the system can outperform C4.5, a well-known non-adaptive machine learning algorithm, under certain conditions. The results demonstrate the system\u27s ability to learn effective rules from repeated presentations of a tagged training set and show the degree of generalization achieved on an independent test set. This thesis is an extension and expansion of the work published in the Security, Trust, and Privacy for Software Applications workshop in COMPSAC 2011 - the 35th Annual IEEE Signature Conference on Computer Software and Applications --Abstract, page iii

    Моделювання поширення шкідливого програмного забезпечення на основі стохастичних і детермінованих підходів

    Get PDF
    Метою роботи є розроблення програмного забезпечення моделювання поширення шкідливого програмного забезпечення в інформаційно-комунікаційних системах з використанням стохастичних і детермінованих моделей. Об’єктом дослідження є шкідливе програмне забезпечення та сучасні технічні засоби дослідження їх поведінки в інформаційно-комунікаційних системах. Предметом дослідження є сучасні ефективні математичні методи і технології прогнозування поширення шкідливого програмного забезпечення в сучасних операційних системах. Методами дослідження є сучасні методики теорії керування, теорії оптимізації, актуальні середовища розроблення детермінованих і стохастичних математичних моделей, імітаційних моделей, порівняльний аналіз. Результати досліджень можуть бути ефективно адаптовані в суміжні сфери досліджень, які передбачають мінімізацію ризиків, що можуть бути нанесені різними видами ШПЗ у мовах сьогодення.The aim of the work is to develop software for modeling the spread of malicious software in information and communication systems using stochastic and deterministic models. The object of research is malicious software and modern technical means of studying their behavior in information and communication systems. The subject of research is modern effective mathematical methods and technologies for predicting the spread of malicious software in modern operating systems. Research methods are modern methods of control theory, optimization theory, current environments for the development of deterministic and stochastic mathematical models, simulation models, comparative analysis. The results of research can be effectively adapted to related areas of research, which involve minimizing the risks that may be posed by different types of malware in today's. Two copyright applications have been submitted for the malware simulation software developed

    Survey on representation techniques for malware detection system

    Get PDF
    Malicious programs are malignant software’s designed by hackers or cyber offenders with a harmful intent to disrupt computer operation. In various researches, we found that the balance between designing an accurate architecture that can detect the malware and track several advanced techniques that malware creators apply to get variants of malware are always a difficult line. Hence the study of malware detection techniques has become more important and challenging within the security field. This review paper provides a detailed discussion and full reviews for various types of malware, malware detection techniques, various researches on them, malware analysis methods and different dynamic programmingbased tools that could be used to represent the malware sampled. We have provided a comprehensive bibliography in malware detection, its techniques and analysis methods for malware researchers
    corecore