87 research outputs found

    Data distribution and task scheduling for distributed computing of all-to-all comparison problems

    Get PDF
    This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern

    Memory management and parallelization of data intensive all-to-all comparison in shared-memory systems

    Get PDF
    This thesis presents a novel program parallelization technique incorporating with dynamic and static scheduling. It utilizes a problem specific pattern developed from the prior knowledge of the targeted problem abstraction. Suitable for solving complex parallelization problems such as data intensive all-to-all comparison constrained by memory, the technique delivers more robust and faster task scheduling compared to the state-of-the art techniques. Good performance is achieved from the technique in data intensive bioinformatics applications

    ํด๋ผ์šฐ๋“œ ํ™˜๊ฒฝ์—์„œ ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์ธ IoT ์ŠคํŠธ๋ฆผ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ์—”๋“œ-ํˆฌ-์—”๋“œ ์ตœ์ ํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ์—„ํƒœ๊ฑด.As a large amount of data streams are generated from Internet of Things (IoT) devices, two types of IoT stream queries are deployed in the cloud. One is a small IoT-stream query, which continuously processes a few IoT data streams of end-usersโ€™s IoT devices that have low input rates (e.g., one event per second). The other one is a big IoT-stream query, which is deployed by data scientists to continuously process a large number and huge amount of aggregated data streams that can suddenly fluctuate in a short period of time (bursty loads). However, existing work and stream systems fall short of handling such workloads efficiently because their query submission, compilation, execution, and resource acquisition layer are not optimized for the workloads. This dissertation proposes two end-to-end optimization techniquesโ€” not only optimizing stream query execution layer (runtime), but also optimizing query submission, compiler, or resource acquisition layer. First, to minimize the number of cloud machines and maintenance cost of servers in processing many small IoT queries, we build Pluto, a new stream processing system that optimizes both query submission and execution layer for efficiently handling many small IoT stream queries. By decoupling IoT query submission and its code registration and offering new APIs, Pluto mitigates the bottleneck in query submission and enables efficient resource sharing across small IoT stream queries in the execution. Second, to quickly handle sudden bursty loads and scale out big IoT stream queries, we build Sponge, which is a new stream system that optimizes query compilation, execution, and resource acquisition layer altogether. For fast acquisition of new resources, Sponge uses a new cloud computing service, called Lambda, because it offers fast-to-start lightweight containers. Sponge then converts the streaming dataflow of big stream queries to overcome Lambdaโ€™s resource constraint and to minimize scaling overheads at runtime. Our evaluations show that the end-to-end optimization techniques significantly improve system throughput and latency compared to existing stream systems in handling a large number of small IoT stream queries and in handling bursty loads of big IoT stream queries.๋‹ค์–‘ํ•œ IoT ๋””๋ฐ”์ด์Šค๋กœ๋ถ€ํ„ฐ ๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ๋“ค์ด ์ƒ์„ฑ๋˜๋ฉด์„œ, ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ํƒ€์ž…์˜ ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ๊ฐ€ ํด๋ผ์šฐ๋“œ์—์„œ ์ˆ˜ํ–‰๋œ๋‹ค. ์ฒซ์งธ๋กœ๋Š” ์ž‘์€-IoT ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ์ด๋ฉฐ, ํ•˜๋‚˜์˜ ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ๊ฐ€ ์ ์€ ์–‘์˜ IoT ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ์„ ์ฒ˜๋ฆฌํ•˜๊ณ  ๋งŽ์€ ์ˆ˜์˜ ์ž‘์€ ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ๋“ค์ด ์กด์žฌํ•œ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ๋Š” ํฐ-IoT ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ์ด๋ฉฐ, ํ•˜๋‚˜ ์˜ ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ๊ฐ€ ๋งŽ์€ ์–‘์˜, ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•˜๋Š” IoT ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ๋“ค์„ ์ฒ˜๋ฆฌํ•œ๋‹ค. ํ•˜์ง€๋งŒ, ๊ธฐ์กด ์—ฐ๊ตฌ์™€ ์ŠคํŠธ๋ฆผ ์‹œ์Šคํ…œ์—์„œ๋Š” ์ฟผ๋ฆฌ ์ˆ˜ํ–‰, ์ œ์ถœ, ์ปดํŒŒ์ผ๋Ÿฌ, ๋ฐ ๋ฆฌ์†Œ์Šค ํ™•๋ณด ๋ ˆ์ด์–ด๊ฐ€ ์ด๋Ÿฌํ•œ ์›Œํฌ๋กœ๋“œ์— ์ตœ์ ํ™”๋˜์–ด ์žˆ์ง€ ์•Š์•„์„œ ์ž‘์€-IoT ๋ฐ ํฐ-IoT ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•œ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ž‘์€-IoT ๋ฐ ํฐ-IoT ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ ์›Œํฌ๋กœ๋“œ๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์—”๋“œ-ํˆฌ-์—”๋“œ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ, ๋งŽ์€ ์ˆ˜์˜ ์ž‘์€-IoT ์ŠคํŠธ๋ฆผ ์ฟผ ๋ฆฌ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด, ์ฟผ๋ฆฌ ์ œ์ถœ๊ณผ ์ˆ˜ํ–‰ ๋ ˆ์ด์–ด๋ฅผ ์ตœ์ ํ™” ํ•˜๋Š” ๊ธฐ๋ฒ•์ธ IoT ํŠน์„ฑ ๊ธฐ๋ฐ˜ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ฟผ๋ฆฌ ์ œ์ถœ๊ณผ ์ฝ”๋“œ ๋“ฑ๋ก์„ ๋ถ„๋ฆฌํ•˜๊ณ , ์ด๋ฅผ ์œ„ํ•œ ์ƒˆ๋กœ์šด API๋ฅผ ์ œ๊ณตํ•จ์œผ๋กœ์จ, ์ฟผ๋ฆฌ ์ œ์ถœ์—์„œ์˜ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๊ณ  ์ฟผ๋ฆฌ ์ˆ˜ํ–‰์—์„œ IoT ํŠน ์„ฑ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฆฌ์†Œ์Šค๋ฅผ ๊ณต์œ ํ•จ์œผ๋กœ์จ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ธ๋‹ค. ๋‘๋ฒˆ์งธ๋กœ, ํฐ-IoT ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ์—์„œ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•˜๋Š” ๋กœ๋“œ๋ฅผ ๋น ๋ฅด๊ฒŒ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด, ์ฟผ๋ฆฌ ์ปดํŒŒ์ผ๋Ÿฌ, ์ˆ˜ํ–‰, ๋ฐ ๋ฆฌ์†Œ์Šค ํ™•๋ณด ๋ ˆ์ด์–ด ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ƒˆ๋กœ์šด ํด๋ผ์šฐ๋“œ ์ปดํ“จํŒ… ๋ฆฌ์†Œ์Šค์ธ ๋žŒ๋‹ค๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋น ๋ฅด๊ฒŒ ๋ฆฌ์†Œ์Šค๋ฅผ ํ™•๋ณดํ•˜๊ณ , ๋žŒ๋‹ค์˜ ์ œํ•œ๋œ ๋ฆฌ์†Œ์Šค์—์„œ ์Šค์ผ€์ผ-์•„์›ƒ ์˜ค ๋ฒ„ํ—ค๋“œ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ŠคํŠธ๋ฆผ ๋ฐ์ดํ„ฐํ”Œ๋กœ์šฐ๋ฅผ ๋ฐ”๊ฟˆ์œผ๋กœ์จ ํฐ-IoT ์ŠคํŠธ๋ฆผ ์ฟผ๋ฆฌ์˜ ์ž‘์—…๋Ÿ‰์„ ๋น ๋ฅด๊ฒŒ ๋žŒ๋‹ค๋กœ ์˜ฎ๊ธด๋‹ค. ์ตœ์ ํ™” ๊ธฐ๋ฒ•์˜ ํšจ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•ด, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋‘๊ฐ€์ง€ ์‹œ์Šคํ…œ-Pluto ์™€ Sponge-์„ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์‹คํ—˜์„ ํ†ตํ•ด์„œ, ๊ฐ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ์ ์šฉํ•œ ๊ฒฐ๊ณผ ๊ธฐ์กด ์‹œ์Šคํ…œ ๋Œ€๋น„ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์œผ๋ฉฐ, ์ง€์—ฐ์‹œ๊ฐ„์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค.Chapter 1 Introduction 1 1.1 IoT Stream Workloads 1 1.1.1 Small IoT Stream Query 2 1.1.2 Big IoT Stream Query 4 1.2 Proposed Solution 5 1.2.1 IoT-Aware Three-Phase Query Execution 6 1.2.2 Streaming Dataflow Reshaping on Lambda 7 1.3 Contribution 8 1.4 Dissertation Structure 9 Chapter 2 Background 10 2.1 Stream Query Model 10 2.2 Workload Characteristics 12 2.2.1 Small IoT Stream Query 12 2.2.2 Big IoT Stream Query 13 Chapter 3 IoT-Aware Three-Phase Query Execution 15 3.1 Pluto Design Overview 16 3.2 Decoupling of Code and Query Submission 19 3.2.1 Code Registration 19 3.2.2 Query Submission API 20 3.3 IoT-Aware Execution Model 21 3.3.1 Q-Group Creation and Query Grouping 24 3.3.2 Q-Group Assignment 24 3.3.3 Q-Group Scheduling and Processing 25 3.3.4 Load Rebalancing: Q-Group Split and Merging 28 3.4 Implementation 29 3.5 Evaluation 30 3.5.1 Methodology 30 3.5.2 Performance Comparison 34 3.5.3 Performance Breakdown 36 3.5.4 Load Rebalancing: Q-Group Split and Merging 38 3.5.5 Tradeoff 40 3.6 Discussion 41 3.7 Related Work 43 3.8 Summary 44 Chapter 4 Streaming Dataflow Reshaping for Fast Scaling Mechanism on Lambda 46 4.1 Motivation 46 4.2 Challenges 47 4.3 Design Overview 50 4.4 Reshaping Rules 51 4.4.1 R1:Inserting Router Operators 52 4.4.2 R2:Inserting Transient Operators 54 4.4.3 R3:Inserting State Merger Operators 57 4.5 Scaling Protocol 59 4.5.1 Redirection Protocol 59 4.5.2 Merging Protocol 60 4.5.3 Migration Protocol 61 4.6 Implementation 61 4.7 Evaluation 63 4.7.1 Methodology 63 4.7.2 Performance Analysis 68 4.7.3 Performance Breakdown 70 4.7.4 Latency-Cost($) Trade-Off 76 4.8 Discussion 77 4.9 Related Work 78 4.10 Summary 80 Chapter 5 Conclusion 81๋ฐ•

    Parallelised and vectorised ant colony optimization

    Get PDF
    Ant Colony Optimisation (ACO) is a versatile population-based optimisation metaheuristic based on the foraging behaviour of certain species of ant, and is part of the Evolutionary Computation family of algorithms. While ACO generally provides good quality solutions to the problems it is applied to, two key limitations prevent it from being truly viable on large-scale problems: A high memory requirement that grows quadratically with instance size, and high execution time. This thesis presents a parallelised and vectorised implementation of ACO using OpenMP and AVX SIMD instructions; while this alone is enough to improve upon the execution time of the algorithm, this implementation also features an alternative memory structure and a novel candidate set approach, the use of which significantly reduces the memory requirement of ACO. This parallelism is enabled through the use of Max-Min Ant System, an ACO variant that only utilises local memory during the solution process and therefore risks no synchronisation issues, and an adaptation of vRoulette, a vector-compatible variant of the common roulette wheel selection method. Through the use of these techniques ACO is also able to find good quality solutions for the very large Art TSPs, a problem set that has traditionally been unfeasible to solve with ACO due to high memory requirements and execution time. These techniques can also benefit ACO when it comes to solving other problems. In this case the Virtual Machine Placement problem, in which Virtual Machines have to be efficiently allocated to Physical Machines in a cloud environment, is used as a benchmark, with significant improvements to execution time

    Supercomputing Frontiers

    Get PDF
    This open access book constitutes the refereed proceedings of the 7th Asian Conference Supercomputing Conference, SCFA 2022, which took place in Singapore in March 2022. The 8 full papers presented in this book were carefully reviewed and selected from 21 submissions. They cover a range of topics including file systems, memory hierarchy, HPC cloud platform, container image configuration workflow, large-scale applications, and scheduling

    Parallel and distributed computing techniques in biomedical engineering

    Get PDF
    Master'sMASTER OF ENGINEERIN

    Generating and auto-tuning parallel stencil codes

    Get PDF
    In this thesis, we present a software framework, Patus, which generates high performance stencil codes for different types of hardware platforms, including current multicore CPU and graphics processing unit architectures. The ultimate goals of the framework are productivity, portability (of both the code and performance), and achieving a high performance on the target platform. A stencil computation updates every grid point in a structured grid based on the values of its neighboring points. This class of computations occurs frequently in scientific and general purpose computing (e.g., in partial differential equation solvers or in image processing), justifying the focus on this kind of computation. The proposed key ingredients to achieve the goals of productivity, portability, and performance are domain specific languages (DSLs) and the auto-tuning methodology. The Patus stencil specification DSL allows the programmer to express a stencil computation in a concise way independently of hardware architecture-specific details. Thus, it increases the programmer productivity by disburdening her or him of low level programming model issues and of manually applying hardware platform-specific code optimization techniques. The use of domain specific languages also implies code reusability: once implemented, the same stencil specification can be reused on different hardware platforms, i.e., the specification code is portable across hardware architectures. Constructing the language to be geared towards a special purpose makes it amenable to more aggressive optimizations and therefore to potentially higher performance. Auto-tuning provides performance and performance portability by automated adaptation of implementation-specific parameters to the characteristics of the hardware on which the code will run. By automating the process of parameter tuning โ€” which essentially amounts to solving an integer programming problem in which the objective function is the number representing the code's performance as a function of the parameter configuration, โ€” the system can also be used more productively than if the programmer had to fine-tune the code manually. We show performance results for a variety of stencils, for which Patus was used to generate the corresponding implementations. The selection includes stencils taken from two real-world applications: a simulation of the temperature within the human body during hyperthermia cancer treatment and a seismic application. These examples demonstrate the framework's flexibility and ability to produce high performance code

    Designing, Building, and Modeling Maneuverable Applications within Shared Computing Resources

    Get PDF
    Extending the military principle of maneuver into war-๏ฌghting domain of cyberspace, academic and military researchers have produced many theoretical and strategic works, though few have focused on researching actual applications and systems that apply this principle. We present our research in designing, building and modeling maneuverable applications in order to gain the system advantages of resource provisioning, application optimization, and cybersecurity improvement. We have coined the phrase โ€œManeuverable Applicationsโ€ to be de๏ฌned as distributed and parallel application that take advantage of the modi๏ฌcation, relocation, addition or removal of computing resources, giving the perception of movement. Our work with maneuverable applications has been within shared computing resources, such as the Clemson University Palmetto cluster, where multiple users share access and time to a collection of inter-networked computers and servers. In this dissertation, we describe our implementation and analytic modeling of environments and systems to maneuver computational nodes, network capabilities, and security enhancements for overcoming challenges to a cyberspace platform. Speci๏ฌcally we describe our work to create a system to provision a big data computational resource within academic environments. We also present a computing testbed built to allow researchers to study network optimizations of data centers. We discuss our Petri Net model of an adaptable system, which increases its cybersecurity posture in the face of varying levels of threat from malicious actors. Lastly, we present work and investigation into integrating these technologies into a prototype resource manager for maneuverable applications and validating our model using this implementation

    Parallelism and evolutionary algorithms

    Full text link

    Performance-aware component composition for GPU-based systems

    Full text link
    • โ€ฆ
    corecore