339 research outputs found

    Global EDF ์Šค์ผ€์ค„๋ง์„ ์œ„ํ•œ ์‹ค์‹œ๊ฐ„ ํƒœ์Šคํฌ์˜ ์กฐ๊ฑด๋ถ€ ์ตœ์  ๋ณ‘๋ ฌํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2022.2. ์ด์ฐฝ๊ฑด.Real-time applications are rapidly growing in their size and complexity. This trend is apparent even in extreme deadline-critical sectors such as autonomous driving and artificial intelligence. Such complex program models are described by a directed-acyclic-graph (DAG), where each node of the graph represents a task, and the connected edges portray the precedence relation between the tasks. With this intricate structure and intensive computation requirements, extensive system-wide optimization is required to ensure a stable real-time execution. On the other hand, recent advances in parallel computing frameworks, such as OpenCL and OpenMP, allow us to parallelize a real-time task into many different versions, which is called ``parallelization freedom.'' Depending on the degree of parallelization, the thread execution times can vary significantly, that is, more parallelization tends to reduce each thread execution time but increase the total execution time due to parallelization overhead. By carefully selecting a ``parallelization option'' for each task, i.e., the chosen number of threads each task is parallelized into, we can maximize the system schedulability while satisfying real-time constraints. Because of this benefit, parallelization freedom has drawn recent attention. However, for global EDF scheduling, G-EDF for short, the concept of parallelization freedom still has not brought much attention. To this extent, this dissertation proposes a way of optimally assigning parallelization options to real-time tasks for G-EDF on a multi-core system. Moreover, we aim to propose a polynomial-time algorithm that can be used for online situations where tasks dynamically join and leave. We formalize a monotonic increasing property of both tolerance and interference to the parallelization option to achieve this. Using such properties, we develop a uni-directional search algorithm that can assign parallelization options in polynomial time, which we formally prove the optimality. With the optimal parallelization, we observe significant improvement of schedulability through simulation experiment, and then in the following implementation experiment, we demonstrate that the algorithm is practically applicable for real-world use-cases. This dissertation first focuses on the traditional task model, i.e., multi-thread task model, then extends also to target the multi-segment (MS) task model and finally discusses the more general directed-acyclic-graph (DAG) task model to accommodate a wide range of real-world computing models.์ด ๋…ผ๋ฌธ์€ Global EDF ์Šค์ผ€์ค„๋Ÿฌ์—์„œ์˜ ์‹ค์‹œ๊ฐ„ ํƒœ์Šคํฌ์˜ ์ตœ์  ๋ณ‘๋ ฌํ™” ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ๊ธฐ์ˆ ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ณ‘๋ ฌํ™” ์˜ต์…˜ ์ฆ๊ฐ€์— ๋Œ€ํ•ด tolerance์™€ interference๊ฐ€ ๋‹จ์กฐ ์ฆ๊ฐ€ํ•จ์„ ์ˆ˜ํ•™์ ์œผ๋กœ ๋ฐํžŒ๋‹ค. ์ด๋Ÿฌํ•œ ํŠน์„ฑ์„ ์ด์šฉํ•˜์—ฌ, ๋‹คํ•ญ์‹ ์‹œ๊ฐ„์•ˆ์— ์ˆ˜ํ–‰๋  ์ˆ˜ ์žˆ๋Š” ์‹ค์‹œ๊ฐ„ ํƒœ์Šคํฌ์˜ ๋ณ‘๋ ฌํ™” ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜๊ณ , ์ˆ˜ํ•™์ ์œผ๋กœ ์ตœ์ ์˜ ๋ฐฉ๋ฒ•์ž„์„ ์ฆ๋ช…ํ•œ๋‹ค. ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์‹คํ—˜์„ ํ†ตํ•ด, ์‹ค์‹œ๊ฐ„ ํƒœ์Šคํฌ์˜ ์Šค์ผ€์ค„๋ง ์„ฑ๋Šฅ์˜ ๋น„์•ฝ์ ์ธ ์ƒ์Šน์„ ๊ด€์ฐฐํ•˜๊ณ , ์ด์–ด์ง€๋Š” ์‹ค์ œ ์„ธ๊ณ„ ์›Œํฌ๋กœ๋“œ๋ฅผ ์ด์šฉํ•œ ๊ตฌํ˜„ ์‹คํ—˜์—์„œ ์‹ค์ œ ์„ธ๊ณ„ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ์ ๊ฒ€ํ•œ๋‹ค. ์ด ๋…ผ๋ฌธ์€ ๋จผ์ € ์ „ํ†ต์ ์ธ ๋ฉ€ํ‹ฐ์“ฐ๋ ˆ๋“œ ํƒœ์Šคํฌ ๋ชจ๋ธ์„ ๋Œ€์ƒ์œผ๋กœ ์ตœ์  ๋ณ‘๋ ฌํ™” ๋ฐฉ๋ฒ•์„ ๋…ผํ•˜๋ฉฐ, ์ดํ›„ ๋ฉ€ํ‹ฐ ์„ธ๊ทธ๋จผํŠธ, DAG ํƒœ์Šคํฌ ๋ชจ๋ธ๊นŒ์ง€ ํ™•์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ธฐ์ˆ ํ•œ๋‹ค.1. Introduction 1 1. 1 Motivation and Objective 1 1. 2 Approach 3 1. 3 Organization 4 2. Related Work 5 2. 1 Real-Time Multi-Core Scheduling 5 2. 2 Real-Time Multi-Core Task Model 6 2. 3 Real-Time Multi-Core Schedulability Analysis 7 3. Optimal Parallelization of Multi-Thread Tasks 9 3. 1 Introduction 9 3. 2 Problem Description 10 3. 3 Extension of BCL Schedulability Analysis 12 3. 3. 1 Overview of BCL Schedulability Analysis 13 3. 3. 2 Properties of Parallelization Freedom 17 3. 4 Optimal Assignment of Parallelization Options 30 3. 4. 1 Optimal Parallelization Assignment Algorithm 33 3. 4. 2 Optimality of Algorithm1 35 3. 4. 3 Time Complexity of Algorithm1 38 3. 5 Experiment Results 38 3. 5. 1 Simulation Results 39 3. 5. 2 Simulated Schedule Results 43 3. 5. 3 Survey on the Boundary Condition of the Parallelization Freedom 45 3. 5. 4 Autonomous Driving Task Implementation Results 48 4. Conditionally Optimal Parallelization of Multi-Segment and DAG Tasks 56 4. 1 Introduction 56 4. 2 Multi-Segment Task Model 58 4. 3 Extension of Chwa-MS Schedulability Analysis 60 4. 3. 1 Chwa-MS Schedulability Analysis 60 4. 3. 2 Tolerance and Interference of Multi-Segment Tasks 62 4. 4 Assigning Parallelization Options to Multi-Segments 63 4. 4. 1 Parallelization Route 63 4. 4. 2 Assigning Parallelization Options to Multi-Segment Tasks 66 4. 4. 3 Time complexity of Algorithm2 69 4.5 DAG (Directed Acyclic Graph) Task Model 69 4. 6 Extension of Chwa-DAG Schedulability Analysis 73 4. 6. 1 Chwa-DAG Schedulability Analysis 73 4. 6. 2 Tolerance and Interference of DAG Tasks 78 4. 7 Assigning Parallelization Options to DAG Tasks 87 4. 7. 1 Parallelization Route for DAG Task Model 87 4. 7. 2 Assigning Parallelization Options to DAG Tasks 90 4. 7. 3 Time Complexity of Algorithm3 91 4. 8 Experiment Results: Multi-Segment Task Model 93 4. 9 Experiment Results: DAG Task Model 96 4. 9. 1 Simulation Results 97 4. 9. 2 Implementation Results 100 5 Conclusion 104 5. 1 Summary 104 5. 2 Future Work 105 6. References 106๋ฐ•

    Massively-Parallel Feature Selection for Big Data

    Full text link
    We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of pp-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

    A path-level exact parallelization strategy for sequential simulation

    Get PDF
    Sequential Simulation is a well known method in geostatistical modelling. Following the Bayesian approach for simulation of conditionally dependent random events, Sequential Indicator Simulation (SIS) method draws simulated values for K categories (categorical case) or classes defined by K different thresholds (continuous case). Similarly, Sequential Gaussian Simulation (SGS) method draws simulated values from a multivariate Gaussian field. In this work, a path-level approach to parallelize SIS and SGS methods is presented. A first stage of re-arrangement of the simulation path is performed, followed by a second stage of parallel simulation for non-conflicting nodes. A key advantage of the proposed parallelization method is to generate identical realizations as with the original non-parallelized methods. Case studies are presented using two sequential simulation codes from GSLIB: SISIM and SGSIM. Execution time and speedup results are shown for large-scale domains, with many categories and maximum kriging neighbours in each case, achieving high speedup results in the best scenarios using 16 threads of execution in a single machine.Peer ReviewedPostprint (author's final draft

    Frequent itemset mining on multiprocessor systems

    Get PDF
    Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithmsโ€™ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate dataโ€™s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets

    Learning Gaussian graphical models with fractional marginal pseudo-likelihood

    Get PDF
    We propose a Bayesian approximate inference method for learning the dependence structure of a Gaussian graphical model. Using pseudo-likelihood, we derive an analytical expression to approximate the marginal likelihood for an arbitrary graph structure without invoking any assumptions about decomposability. The majority of the existing methods for learning Gaussian graphical models are either restricted to decomposable graphs or require specification of a tuning parameter that may have a substantial impact on learned structures. By combining a simple sparsity inducing prior for the graph structures with a default reference prior for the model parameters, we obtain a fast and easily applicable scoring function that works well for even high-dimensional data. We demonstrate the favourable performance of our approach by large-scale comparisons against the leading methods for learning non-decomposable Gaussian graphical models. A theoretical justification for our method is provided by showing that it yields a consistent estimator of the graph structure. (C) 2017 Elsevier Inc. All rights reserved.Peer reviewe

    Novel Parallelization Techniques for Computer Graphics Applications

    Get PDF
    Increasingly complex and data-intensive algorithms in computer graphics applications require software engineers to find ways of improving performance and scalability to satisfy the requirements of customers and users. Parallelizing and tailoring each algorithm of each specific application is a time-consuming task and its implementation is domain-specific because it can not be reused outside the specific problem in which the algorithm is defined. Identifying reusable parallelization patterns that can be extrapolated and applied to other different algorithms is an essential task needed in order to provide consistent parallelization improvements and reduce the development time of evolving a sequential algorithm into a parallel one. This thesis focuses on defining general and efficient parallelization techniques and approaches that can be followed in order to parallelize complex 3D graphic algorithms. These parallelization patterns can be easily applied in order to convert most kinds of sequential complex and data-intensive algorithms to parallel ones obtaining consistent optimization results. The main idea in the thesis is to use multi-threading techniques to improve the parallelization and core utilization of 3D algorithms. Most of the 3D algorithms apply similar repetitive independent operations on a vast amount of 3D data. These application characteristics bring the opportunity of applying multi-thread parallelization techniques on such applications. The efficiency of the proposed idea is tested on two common computer graphics algorithms: hidden-line removal and collision detection. Both algorithms are data-intensive algorithms, whose conversions from a sequential to a multithread implementation introduce challenges, due to their complexities and the fact that elements in their data have different sizes and complexities, producing work-load imbalances and asymmetries between processing elements. The results show that the proposed principles and patterns can be easily applied to both algorithms, transforming their sequential to multithread implementations, obtaining consistent optimization results proportional to the number of processing elements. From the work done in this thesis, it is concluded that the suggested parallelization warrants further study and development in order to extend its usage to heterogeneous platforms such as a Graphical Processing Unit (GPU). OpenCL is the most feasible framework to explore in the future due to its interoperability among different platforms

    QCBA: Postoptimization of Quantitative Attributes in Classifiers based on Association Rules

    Full text link
    The need to prediscretize numeric attributes before they can be used in association rule learning is a source of inefficiencies in the resulting classifier. This paper describes several new rule tuning steps aiming to recover information lost in the discretization of numeric (quantitative) attributes, and a new rule pruning strategy, which further reduces the size of the classification models. We demonstrate the effectiveness of the proposed methods on postoptimization of models generated by three state-of-the-art association rule classification algorithms: Classification based on Associations (Liu, 1998), Interpretable Decision Sets (Lakkaraju et al, 2016), and Scalable Bayesian Rule Lists (Yang, 2017). Benchmarks on 22 datasets from the UCI repository show that the postoptimized models are consistently smaller -- typically by about 50% -- and have better classification performance on most datasets

    Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

    Get PDF
    The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system
    • โ€ฆ
    corecore