706 research outputs found

    ๊ทธ๋ž˜ํ”„ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ์œ„ํ•œ ์ ์ง„์  ์œ ์ „ ์•Œ๊ณ ๋ฆฌ์ฆ˜

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2016. 8. ๋ฌธ๋ณ‘๋กœ.A combinatorial optimization problem is an optimization problem having a discrete solution space. Lots of the graph problems belong to this category as graphs are discrete objects. Graphs are widely used in the various field and there are lots of real world combinatorial optimization problems which take the graphs as their input. For some of these problems, the magnitude of the solution space is exponential to the size of the problem, and thereby efficient space search algorithms are required to deal with them. Genetic algorithms are widely used to solve combinatorial optimization problems, and incremental genetic algorithms could be used to efficiently solve graph optimization problems.We define subproblems and solve them step by step instead of tackling the problems directly. A subproblem solved by an incremental genetic algorithm deals with a restriction of the original graph structure. The subproblems are solved in the intermediate steps and the size of the subproblem is gradually increased. We apply the same genetic algorithm to each subproblem, and it is initialized with the evolved population of the previous step. We propose incremental genetic algorithms for two different combinatorial optimization problemsthe subgraph isomorphism problem and graph cut optimization problem. We devise an optimal substructure on the subproblem sequence and explain how it is related to the optimality of the process, along with other related factors. We present graph expansion methodologies and vertex reordering schemes to define an appropriate sequence of subproblems. We combine the proposed incremental approach with a hybrid genetic algorithm for the subgraph isomorphism problem, and the algorithm was further developed for nearly perfect results. Based on our analysis, we also propose an incremental genetic algorithm to solve graph cut optimization problems. We tested the implementation of the algorithm on benchmark graph instances for the graph partitioning problem and the maximum cut problem. Through experiments, we investigate and analyze how the sequence of subproblems affects the search space landscape. The performance of a genetic algorithm makes an improvement when the incremental approach is applied with respect to an appropriate sequence of subproblems.Chapter I. Introduction 1 Chapter II. Incremental Genetic Algorithm 6 2.1 Overview and Traditional Applications 6 2.2 Application on Graph Optimization Problems 9 2.2.1 Formalization of the Incremental Process 9 2.2.2 Theoretical Background 12 2.2.3 Sequence of Subproblems 15 Chapter III. Subgraph Isomorphism Problem 19 3.1 Introduction 19 3.2 The Proposed Algorithm 21 3.2.1 The Structure of the Incremental Genetic Algorithm 21 3.2.2 Design Issues 25 3.2.3 Genetic Framework 28 3.3 Experimental Results 31 3.3.1 Dataset and Evaluation 31 3.3.2 Results and Discussions 33 3.3.3 Overall Results 39 3.4 Further Improvement 42 3.4.1 New Operators 43 3.4.2 Improvements by New Operators 45 3.4.3 Overall Result 46 Chapter IV. Graph Cut Optimization Problems 50 4.1 Introduction 50 4.2 The Proposed Algorithm 51 4.2.1 Subproblem Structure 51 4.2.2 Reordering Schemes 54 4.2.3 Genetic Framework 55 4.3 Experimental Results 57 4.3.1 Dataset and Evaluation 57 4.3.2 Results on Graph Partitioning Problem 58 4.3.3 Results on Maximum Cut Problem 66 4.3.4 Results on Problem Variants 70 Chapter V. Related Applications 75 5.1 Measuring Source Code Similarity with an Incremental Genetic Algorithm 75 5.1.1 Introduction 75 5.1.2 The Proposed System 76 5.1.3 Experimental Results 80 5.1.4 Discussion 88 5.2 Linear Ordering Problem and an Approximate Fitness Evaluation 88 5.2.1 Introduction 88 5.2.2 The Proposed Method 89 5.2.3 Experimental Results 91 Chapter VI. Conclusions 94 Bibliography 96 ๊ตญ๋ฌธ ์ดˆ๋ก 106Docto

    A lightweight, graph-theoretic model of class-based similarity to support object-oriented code reuse.

    Get PDF
    The work presented in this thesis is principally concerned with the development of a method and set of tools designed to support the identification of class-based similarity in collections of object-oriented code. Attention is focused on enhancing the potential for software reuse in situations where a reuse process is either absent or informal, and the characteristics of the organisation are unsuitable, or resources unavailable, to promote and sustain a systematic approach to reuse. The approach builds on the definition of a formal, attributed, relational model that captures the inherent structure of class-based, object-oriented code. Based on code-level analysis, it relies solely on the structural characteristics of the code and the peculiarly object-oriented features of the class as an organising principle: classes, those entities comprising a class, and the intra and inter-class relationships existing between them, are significant factors in defining a two-phase similarity measure as a basis for the comparison process. Established graph-theoretic techniques are adapted and applied via this model to the problem of determining similarity between classes. This thesis illustrates a successful transfer of techniques from the domains of molecular chemistry and computer vision. Both domains provide an existing template for the analysis and comparison of structures as graphs. The inspiration for representing classes as attributed relational graphs, and the application of graph-theoretic techniques and algorithms to their comparison, arose out of a well-founded intuition that a common basis in graph-theory was sufficient to enable a reasonable transfer of these techniques to the problem of determining similarity in object-oriented code. The practical application of this work relates to the identification and indexing of instances of recurring, class-based, common structure present in established and evolving collections of object-oriented code. A classification so generated additionally provides a framework for class-based matching over an existing code-base, both from the perspective of newly introduced classes, and search "templates" provided by those incomplete, iteratively constructed and refined classes associated with current and on-going development. The tools and techniques developed here provide support for enabling and improving shared awareness of reuse opportunity, based on analysing structural similarity in past and ongoing development, tools and techniques that can in turn be seen as part of a process of domain analysis, capable of stimulating the evolution of a systematic reuse ethic

    ๋ถ€๋ถ„ ๊ทธ๋ž˜ํ”„ ๋™ํ˜• ์‚ฌ์ƒ ๋ฌธ์ œ๋ฅผ ์œ„ํ•œ ์ ์ง„์  ์œ ์ „ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์กฐ์‚ฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. ๋ฌธ๋ณ‘๋กœ.๊ทธ๋ž˜ํ”„๋Š” ๊ฐ์ฒด๋“ค์˜ ๊ด€๊ณ„๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์ธ ์ž๋ฃŒ๊ตฌ์กฐ์ด๊ณ , ๋ฐ์ดํ„ฐ๊ฐ€ ๊ทธ๋ž˜ํ”„ ํ˜•ํƒœ๋กœ ํ‘œํ˜„๋˜๋Š” ๋งŽ์€ ์—ฐ๊ตฌ๋ถ„์•ผ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ํ•ต์‹ฌ ๋ฌธ์ œ๋“ค ์ค‘ ํ•˜๋‚˜๊ฐ€ ๋ฐ”๋กœ ๊ทธ๋ž˜ํ”„ ํŒจํ„ด ๋งค์นญ์ด๋‹ค. ๊ทธ๋ž˜ํ”„ ํŒจํ„ด ๋งค์นญ์€ ์ •์ ์ด๋‚˜ ๊ฐ„์„ ๋“ค์˜ ์ •๋ณด๋ฅผ ์ด์šฉํ•œ ์‹œ๋ฉ˜ํ‹ฑ ๊ธฐ๋ฐ˜์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ์ •์˜ํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” ์ •์ ๊ณผ ๊ฐ„์„ ๊ฐ„์˜ ๊ด€๊ณ„๋งŒ์„ ์ด์šฉํ•ด ๊ตฌ์กฐ์ ์œผ๋กœ ์ •์˜ํ•˜๊ณ  ์ด๋Ÿฌํ•œ ํŒจํ„ด๋งค์นญ์€ ๋ถ€๋ถ„๊ทธ๋ž˜ํ”„ ๋™ํ˜•์‚ฌ์ƒ์œผ๋กœ ํ‘œํ˜„๋œ๋‹ค. ๊ทธ๋™์•ˆ ๋ถ€๋ถ„๊ทธ๋ž˜ํ”„ ๋™ํ˜•์‚ฌ์ƒ ๋ฌธ์ œ๋ฅผ ํ’€๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์€ ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ํ˜•ํƒœ๋กœ ๋ถ„๋ฅ˜๋œ๋‹ค. ์ฒซ๋ฒˆ์งธ๋Š” ์žฌ๊ท€์  ํ‡ด๊ฐ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์กด์žฌํ•˜๋Š” ๋ชจ๋“  ํ•ด๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์ฐพ์•„๋‚ด๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๋‹ค๋งŒ, ๋ถ€๋ถ„๊ทธ๋ž˜ํ”„ ๋™ํ˜•์‚ฌ์ƒ ๋ฌธ์ œ๋Š” ๋Œ€ํ‘œ์ ์ธ NP-์™„๋น„๊ตฐ์˜ ๋ฌธ์ œ ์ค‘ ํ•˜๋‚˜์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋“  ์ˆœ์—ด์„ ํ•˜๋‚˜์”ฉ ํƒ์ƒ‰ํ•˜๋Š” ๊ฒฝ์šฐ ์ˆ˜ํ–‰์‹œ๊ฐ„์ด ๋ฌธ์ œ์˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ๋Š˜์–ด๋‚˜๊ฒŒ ๋œ๋‹ค. ๋‘๋ฒˆ์งธ๋Š” ์œ ์ „ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋น„๋กฏํ•œ ๋ฉ”ํƒ€ํœด๋ฆฌ์Šคํ‹ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ธฐ๋ฐ˜์˜ ๊ทผ์‚ฌ์ ์ธ ๋ฐฉ๋ฒ•์ด๋‹ค. ์ด๋“ค์€ ํ•ฉ๋ฆฌ์ ์ธ ์‹œ๊ฐ„ ๋‚ด์— ์ข‹์€ ํ’ˆ์งˆ์˜ ํ•ด๋“ค์„ ์ฐพ์•„๋‚ด์ง€๋งŒ ๋Œ€๋ถ€๋ถ„์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๊ทธ ํฌ๊ณ  ๋ณต์žกํ•œ ๋ฌธ์ œ๊ณต๊ฐ„ ์ „์ฒด๋ฅผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ์„ ๋งŒํผ์˜ ํƒ์ƒ‰๋Šฅ๋ ฅ์„ ๊ฐ–์ถ”์ง€๋Š” ๋ชปํ•˜์˜€๋‹ค. ์—ฐ์‚ฐ์ž๋‚˜ ์ง€์—ญ ํœด๋ฆฌ์Šคํ‹ฑ์„ ๊ฐœ์„ ํ•˜์—ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ณต๊ฐ„ํƒ์ƒ‰๋Šฅ๋ ฅ์„ ์ง์ ‘์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚ฌ์ˆ˜๋„ ์žˆ๊ฒ ์ง€๋งŒ, ์ ํ•ฉ๋„ ํ•จ์ˆ˜๋ฅผ ๋ณ€๊ฒฝํ•˜๊ฑฐ๋‚˜ ํƒ์ƒ‰์ „๋žต ๋ณ€๊ฒฝ์„ ํ†ตํ•ด์„œ๋„ ํฌ๊ฒŒ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋งŒ์•ฝ ์›๋ž˜ ๋ฌธ์ œ๋ฅผ ๋ฉ”ํƒ€ ํœด๋ฆฌ์Šคํ‹ฑ์˜ ํƒ์ƒ‰๋Šฅ๋ ฅ์— ์ ํ•ฉํ•œ ํฌ๊ธฐ์˜ ๋ถ€๋ถ„๋ฌธ์ œ๋กœ ๋ถ„ํ• ํ•˜๊ณ , ์ด ๋ถ€๋ถ„๋ฌธ์ œ๋ฅผ ๋‹จ๊ณ„์ ์œผ๋กœ ํ’€์–ด๊ฐ„๋‹ค๋ฉด ๋ณด๋‹ค ํšจ์œจ์ ์œผ๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, ์ ํ•ฉ๋„ ํ•จ์ˆ˜๋ฅผ ๋ณ€๊ฒฝํ•˜์—ฌ ๊ณต๊ฐ„์„ ๋ณด๋‹ค ๋‹จ์ˆœํ•œ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜์‹œํ‚จ๋‹ค๋ฉด ๊ทธ ํšจ๊ณผ๊ฐ€ ํ›จ์”ฌ ๋” ์ปค์งˆ๊ฒƒ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ถ€๋ถ„๊ทธ๋ž˜ํ”„ ๋™ํ˜•์‚ฌ์ƒ ๋ฌธ์ œ๊ฐ€ ์ด๋ฃจ๋Š” ๋ฌธ์ œ๊ณต๊ฐ„์˜ ํŠน์„ฑ์„ ๋ถ„์„ํ•˜๊ณ  ์ด์— ์–ด์šธ๋ฆฌ๋Š” ์ ํ•ฉ๋„ ํ•จ์ˆ˜์™€ ํƒ์ƒ‰์ „๋žต์„ ๋ฐ”ํƒ•์œผ๋กœ, ์ด ๋ฌธ์ œ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ’€๊ธฐ ์œ„ํ•œ ์œ ์ „์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ, ๋ถ€๋ถ„๊ทธ๋ž˜ํ”„ ๋™ํ˜•์‚ฌ์ƒ ๋ฌธ์ œ์— ์–ด์šธ๋ฆฌ๋Š” ์ƒˆ๋กœ์šด ์ ํ•ฉ๋„ ํ•จ์ˆ˜๋ฅผ ์†Œ๊ฐœํ•˜๊ณ , ์—ฐ์‚ฐ์ž์™€ ํ•จ๊ป˜ ์ƒ์„ฑ๋˜๋Š” ์ ํ•ฉ๋„ ๊ณต๊ฐ„์ด ์–ด๋– ํ•œ ํ˜•ํƒœ๋กœ ๋ณ€ํ˜•๋˜๋Š”์ง€๋ฅผ ์‚ดํŽด๋ณธ๋‹ค. ์šฐ์„ , ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์—์„œ ์‚ฌ์šฉํ•œ ์ ํ•ฉ๋„ ํ•จ์ˆ˜๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋˜ ๋ฌธ์ œ์ ๋“ค์„ ๊ฒ€ํ† ํ•˜๊ณ  ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ถ€๋ถ„๊ทธ๋ž˜ํ”„ ๋™ํ˜•์‚ฌ์ƒ์˜ ์ •์ ์˜ ์ฐจ์ˆ˜ ์กฐ๊ฑด์„ ๋ฐ˜์˜ํ•œ ์ƒˆ๋กœ์šด ํ•จ์ˆ˜๋ฅผ ์„ค๊ณ„ํ•ด์„œ ๊ธฐ์กด์˜ ํ•จ์ˆ˜์™€ ๊ฒฐํ•ฉํ•œ ๋‹ค๋ชฉ์  ์ ํ•ฉ๋„ ํ•จ์ˆ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ดํ›„, ์‹คํ—˜์„ ํ†ตํ•ด์„œ ์ง€์—ญ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๊ฒฐํ•ฉํ–ˆ์„ ๋•Œ ์ ํ•ฉ๋„ ๊ฐ’๋“ค์˜ ๋ณ€ํ™”๊ณผ์ •์„ ํ†ตํ•ด ์ƒˆ๋กœ์šด ์ ํ•ฉ๋„ ํ•จ์ˆ˜์˜ ํŠน์ง•๋“ค์„ ๋ถ„์„ํ•˜ ๊ณ  ์ง€์—ญ์ตœ์ ์ ๋“ค์„ ๋ชจ์•„ ์ ํ•ฉ๋„ ํ•จ์ˆ˜์™€ ํ•ด๋“ค์˜ ํ‰๊ท ๊ฑฐ๋ฆฌ๋ฅผ ์ด์šฉํ•œ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ํ†ตํ•ด ์ œ์•ˆํ•œ ์ ํ•ฉ๋„ ํ•จ์ˆ˜๊ฐ€ ๊ทธ๋ฆฌ๋Š” ๋ฌธ์ œ๊ณต๊ฐ„์ด ๊ธฐ์กด์˜ ๋ฌธ์ œ๊ณต๊ฐ„์„ ์–ด๋–ค ์‹์œผ๋กœ ๋ณ€ํ˜•์‹œํ‚ค๋Š”์ง€๋ฅผ ์„ค๋ช…ํ•œ๋‹ค. ์ œ์•ˆํ•œ ๋‹ค๋ชฉ์  ์ ํ•ฉ๋„ ํ•จ์ˆ˜๋ฅผ ํ˜ผ ํ•ฉํ˜• ์œ ์ „์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์ ์šฉํ•œ ๊ฒฐ๊ณผ๋ฅผ ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์˜ ๊ฒฐ๊ณผ๋“ค๊ณผ ๋น„๊ตํ•˜์—ฌ ์ œ์•ˆํ•œ ๋‹ค๋ชฉ์  ์ ํ•ฉ๋„ ํ•จ์ˆ˜๊ฐ€ ์œ ์ „์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋ฌธ์ œ๊ณต๊ฐ„ํƒ์ƒ‰๊ณผ ์ตœ์ ํ™”์— ์–ผ๋งˆ๋‚˜ ๋„์›€์„ ์ฃผ๋Š”์ง€๋ฅผ ํ™•์ธํ•œ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ, ์ƒˆ๋กญ๊ฒŒ ์„ค๊ณ„๋œ ๋ฌธ์ œ๊ณต๊ฐ„์„ ํšจ์œจ์ ์œผ๋กœ ํƒ์ƒ‰ํ•˜๊ธฐ ์œ„ํ•œ ์ „๋žต์œผ๋กœ ์ ์ง„์  ์œ ์ „ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์†Œ๊ฐœํ•˜๊ณ  ๊ฐ ์„ค๊ณ„์š”์†Œ๋“ค์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ˆ˜ํ–‰๊ณผ์ •๊ณผ ์„ฑ๋Šฅ์— ์–ด๋–ป๊ฒŒ ๋ฐ˜์˜๋˜๋Š”์ง€๋ฅผ ์•Œ์•„๋ณธ๋‹ค. ์šฐ์„ , ์ ์ง„์  ์œ ์ „์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ ์› ๋ฌธ์ œ๋ฅผ ์ตœ์  ๋ถ€๋ถ„๊ตฌ์กฐ๋ฅผ ๊ฐ–๋Š” ์ผ๋ จ์˜ ์—ฐ์†์ ์ธ ๋ถ€๋ถ„๋ฌธ์ œ๋“ค๋กœ ๋ถ„ํ• ํ•œ ํ›„ ๊ฐ ๋ถ€๋ถ„๋ฌธ์ œ๋ฅผ ํ˜ผํ•ฉํ˜• ์œ ์ „์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ํ’€๊ณ  ์–ป์–ด์ง„ ํ•ด๋“ค์„ ํ™•์žฅํ•˜์—ฌ ๋‹ค์Œ ๋ถ€๋ถ„๋ฌธ์ œ์˜ ์ดˆ๊ธฐํ•ด๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์„ค๋ช…ํ•˜๊ณ , ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ ์šฉํ•˜์—ฌ ์ž‘์€ ๋ถ€๋ถ„๋ฌธ์ œ์˜ ํ•ด๋ฅผ ์›๋ž˜ ๋ฌธ์ œ์˜ ํ•ด๋กœ ๋ฐœ์ „์‹œ์ผœ ์› ๋ฌธ์ œ์˜ ๋‹ต์„ ์–ป๋Š” ๊ณผ์ •์„ ๋ณด์ธ๋‹ค. ์ดํ›„, ์ ์ง„์  ์œ ์ „์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ง„ํ–‰ํ•˜๋Š” ๊ณผ์ •์—์„œ ์›๋ž˜์˜ ๋ฌธ์ œ๋ฅผ ๋ถ„ํ• ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ๋ถ€๋ถ„๋ฌธ์ œ๋“ค์˜ ์—ฐ์†์„ฑ์„ ์„ค์ •ํ•˜๋Š” ๋ถ€๋ถ„์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ „์ฒด ์„ฑ๋Šฅ์— ์–ด๋Š ์ •๋„ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€๋ฅผ์‹คํ—˜์„ํ†ตํ•ด ๋ถ„์„ํ•œ๋‹ค.์ตœ์ข…์ ์œผ๋กœ ๋žœ๋ค๊ทธ๋ž˜ํ”„์— ๋Œ€ํ•ด์„œ ์ œ์•ˆํ•œ ์ ์ง„์  ํ˜ผํ•ฉ ์œ ์ „ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅ๊ณผ ๊ธฐ์กด์˜ ํ˜ผํ•ฉํ˜• ์œ ์ „์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅ์„ ๋น„๊ต ๋ถ„์„ํ•˜๊ณ ,๊ธฐ์กด์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค๋กœ๋Š” ๋ถˆ๊ฐ€๋Šฅํ–ˆ๋˜ ์‚ฌ์ด์ฆˆ๊ฐ€ ํฐ ์‹ค์ œ ๋ฐ์ดํ„ฐ๋“ค์— ๋Œ€ํ•ด์„œ๋„ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„์œผ๋กœ์จ ํ™•์žฅ์„ฑ๊นŒ์ง€ ๊ฐ–์ถ˜ ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.Graph is the most representative data structure for modeling the relationships of objects and graph pattern matching is one of the key problems that arise in many applications where data is expressed in the form of graph. Although graph pattern matching can be defined by a semantic-based method using information such as the labels of vertices or edges, it is generally defined by a structure-based method using only the relationships between vertices and edges, and such pattern matching is represented by the subgraph isomorphism. The algorithms proposed so far to solve the subgraph isomorphism problem are classified into two types. The first is an exact method to find out all existing solutions based on the recursive backtracking algorithm. However, since the subgraph isomorphism problem is NP-complete, if all the permutations are searched one by one, the running time increases exponentially according to the size of the problem. The second is an approximation method based on metaheuristic such as genetic algorithm. They are able to find good quality solutions within a reasonable amount of time, but most algorithms do not have enough search capability to cover the large and complex problem space of this problem. The search capability of a metaheuristic algorithm can be improved by designing better operators or local heuristics, but it is possible to improve the performance greatly by changing the fitness function and by reforming the search strategy. If the original problem is divided into subproblems with the suitable size for the search capability of a metaheuristic algorithm, and each subproblem is solved step by step, the problem can be solved more efficiently. Also, if we change the fitness function to transform the fitness landscape more convex, the effect of an incremental algorithm will be much greater. In this thesis, we propose an efficient incremental hybrid genetic algorithm to solve the subgraph isomorphism problem. First, we introduce a new fitness function which is suitable for the problem of the subgraph isomorphism problem and examine how the fitness landscape generated with the operator is transformed. We introduce a multi-objective fitness function by designing a new function reflecting the degree constraint of the subgraph isomorphism. Through the experiments, we analyze the characteristics of the new fitness function combining with the local optimization algorithm, investigate the correlation between the fitness value and the average distance of the local optima to explain how the new fitness function transforms the fitness landscape of the subgraph isomorphism problem. We compare the results of the hybrid genetic algorithm applying the proposed multi-objective fitness function with that of the conventional genetic algorithm and show how the proposed fitness function facilitates the search capability of a genetic algorithm. Second, we introduce the new efficient search strategy, the incremental genetic algorithm, and how the design issues are reflected in the process and performance of the algorithm. We divide the original problem into a sequence of successive subproblems with the optimal substructure, solve each subproblem through the hybrid genetic algorithm, and then extend the solutions obtained for the initial solutions of the next subproblem. This process is applied sequentially to develop the solutions of the small problem to those of the original problem. Through the experiments, we discuss how to divide the original problem into successivee subproblems and analyze how components of the sequence affect the performance of the incremental genetic algorithm. We also compare the performance of the incremental hybrid genetic algorithm with that of the previous hybrid genetic algorithm through the random graph instances, and show a good scalability the proposed algorithm through the experimental results obtained for real data with a large size that was impossible with existing algorithms.I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 4 II. Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Graph Pattern Matching and Isomorphism . . . . . . . . . . 5 2.2 Subgraph Isomorphism and Related Problems . . . . . . . . 7 2.3 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Structure . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 Representation . . . . . . . . . . . . . . . . . . . . 12 2.3.3 Fitness Function . . . . . . . . . . . . . . . . . . . 13 2.3.4 Crossover . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.5 Mutation . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.6 Hybrid Genetic Algorithm . . . . . . . . . . . . . . 15 III. Inspecting Fitness Function of Subgraph Isomorphism Problem . . . . . . . . . . . . . . . . . . . . 16 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Conventional Fitness Function . . . . . . . . . . . . . . . . 18 3.3 Multi-objective Fitness Function . . . . . . . . . . . . . . . 19 3.4 Local Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 23 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.5.1 Experimental Setting . . . . . . . . . . . . . . . . . 24 3.5.2 Comparison of Single and Multi-objective Function . 25 3.5.3 Global Convexity of the Multi-objective Fitness Landscape . . . . . . . . . . 27 3.5.4 Hybrid Genetic Algorithm . . . . . . . . . . . . . . 30 IV. Incremental Hybrid Genetic Algorithm . . . . . . . . . . . . 34 4.1 Incremental Process . . . . . . . . . . . . . . . . . . . . . . 34 4.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . 37 4.3 Design Schemes . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3.1 Vertex Reordering . . . . . . . . . . . . . . . . . . 40 4.3.2 Stopping Criterion . . . . . . . . . . . . . . . . . . 41 4.3.3 Expansion Size . . . . . . . . . . . . . . . . . . . . 42 4.4 Genetic Frameworks . . . . . . . . . . . . . . . . . . . . . 42 4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . 45 4.5.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . 45 4.5.2 Real World Data . . . . . . . . . . . . . . . . . . . 57 V. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 ๊ตญ๋ฌธ์ดˆ๋ก . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Docto

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    Malware similarity and a new fuzzy hash: Compound Code Block Hash (CCBHash)

    Get PDF
    In the last few years, malware analysis has become increasingly important due to the rise of sophisticated cyberattacks. One of the objectives of this cybersecurity branch is to find similarities between different files or functions used by malware programmers, thus allowing malware detection, classification and even attribution in a timely manner. In this article we survey the state of the art in this area, reviewing the different techniques that can be applied to the field, with the objective of studying similarity, and therefore detecting, classifying and attributing malware samples. We have developed a fuzzy hash capable of characterizing malware by generating an easily comparable and storable signature of its functions. Since our goal is to detect these similarities in huge amounts of data within a reasonable time-frame, the size of the hash must be limited while retaining as much information as possible.Funding for open access charge: Universidad de Mรกlaga / CBU

    A novel graph-based method for targeted ligand-protein fitting

    Get PDF
    A thesis submitted to the Faculty of Creative Arts, Technologies & Science, University of Bedfordshire, in partial & fulfilment of the requirements for the degree of Master of Philosophy.The determination of protein binding sites and ligand -protein fitting are key to understanding the functionality of proteins, from revealing which ligand classes can bind or the optimal ligand for a given protein, such as protein/ drug interactions. There is a need for novel generic computational approaches for representation of protein-ligand interactions and the subsequent prediction of hitherto unknown interactions in proteins where the ligand binding sites are experimentally uncharacterised. The TMSite algorithms read in existing PDB structural data and isolate binding sites regions and identifies conserved features in functionally related proteins (proteins that bind the same ligand). The Boundary Cubes method for surface representation was applied to the modified PDB file allowing the creation of graphs for proteins and ligands that could be compared and caused no loss of geometric data. A method is included for describing binding site features of individual ligands conserved in terms of spatial relationships allowed identification of 3D motifs, named fingerprints, which could be searched for in other protein structures. This method combine with a modification of the pocket algorithm allows reduced search areas for graph matching. The methods allow isolation of the binding site from a complexed protein PDB file, identification of conserved features among the binding sites of individual ligand types, and search for these features in sequence data. In terms of spatial conservation create a fingerprint ofthe binding site that can be sought in other proteins of/mown structure, identifYing putative binding sites. The approach offers a novel and generic method for the identification of putative ligand binding sites for proteins for which there is no prior detailed structural characterisation of protein/ ligand interactions. It is unique in being able to convert PDB data into graphs, ready for comparison and thus fitting of ligand to protein with consideration of chemical charge and in the future other chemica! properties
    • โ€ฆ
    corecore