Search CORE

339 research outputs found

Global EDF 스케줄링을 위한 실시간 태스크의 조건부 최적 병렬화

Author: 조영은
Publication venue: 서울대학교 대학원
Publication date: 01/02/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2022.2. 이창건.Real-time applications are rapidly growing in their size and complexity. This trend is apparent even in extreme deadline-critical sectors such as autonomous driving and artificial intelligence. Such complex program models are described by a directed-acyclic-graph (DAG), where each node of the graph represents a task, and the connected edges portray the precedence relation between the tasks. With this intricate structure and intensive computation requirements, extensive system-wide optimization is required to ensure a stable real-time execution. On the other hand, recent advances in parallel computing frameworks, such as OpenCL and OpenMP, allow us to parallelize a real-time task into many different versions, which is called ``parallelization freedom.'' Depending on the degree of parallelization, the thread execution times can vary significantly, that is, more parallelization tends to reduce each thread execution time but increase the total execution time due to parallelization overhead. By carefully selecting a ``parallelization option'' for each task, i.e., the chosen number of threads each task is parallelized into, we can maximize the system schedulability while satisfying real-time constraints. Because of this benefit, parallelization freedom has drawn recent attention. However, for global EDF scheduling, G-EDF for short, the concept of parallelization freedom still has not brought much attention. To this extent, this dissertation proposes a way of optimally assigning parallelization options to real-time tasks for G-EDF on a multi-core system. Moreover, we aim to propose a polynomial-time algorithm that can be used for online situations where tasks dynamically join and leave. We formalize a monotonic increasing property of both tolerance and interference to the parallelization option to achieve this. Using such properties, we develop a uni-directional search algorithm that can assign parallelization options in polynomial time, which we formally prove the optimality. With the optimal parallelization, we observe significant improvement of schedulability through simulation experiment, and then in the following implementation experiment, we demonstrate that the algorithm is practically applicable for real-world use-cases. This dissertation first focuses on the traditional task model, i.e., multi-thread task model, then extends also to target the multi-segment (MS) task model and finally discusses the more general directed-acyclic-graph (DAG) task model to accommodate a wide range of real-world computing models.이 논문은 Global EDF 스케줄러에서의 실시간 태스크의 최적 병렬화 기법에 대해 기술한다. 이를 위해 병렬화 옵션 증가에 대해 tolerance와 interference가 단조 증가함을 수학적으로 밝힌다. 이러한 특성을 이용하여, 다항식 시간안에 수행될 수 있는 실시간 태스크의 병렬화 기법을 제안하고, 수학적으로 최적의 방법임을 증명한다. 시뮬레이션 실험을 통해, 실시간 태스크의 스케줄링 성능의 비약적인 상승을 관찰하고, 이어지는 실제 세계 워크로드를 이용한 구현 실험에서 실제 세계 적용 가능성을 점검한다. 이 논문은 먼저 전통적인 멀티쓰레드 태스크 모델을 대상으로 최적 병렬화 방법을 논하며, 이후 멀티 세그먼트, DAG 태스크 모델까지 확장하는 방법을 기술한다.1. Introduction 1 1. 1 Motivation and Objective 1 1. 2 Approach 3 1. 3 Organization 4 2. Related Work 5 2. 1 Real-Time Multi-Core Scheduling 5 2. 2 Real-Time Multi-Core Task Model 6 2. 3 Real-Time Multi-Core Schedulability Analysis 7 3. Optimal Parallelization of Multi-Thread Tasks 9 3. 1 Introduction 9 3. 2 Problem Description 10 3. 3 Extension of BCL Schedulability Analysis 12 3. 3. 1 Overview of BCL Schedulability Analysis 13 3. 3. 2 Properties of Parallelization Freedom 17 3. 4 Optimal Assignment of Parallelization Options 30 3. 4. 1 Optimal Parallelization Assignment Algorithm 33 3. 4. 2 Optimality of Algorithm1 35 3. 4. 3 Time Complexity of Algorithm1 38 3. 5 Experiment Results 38 3. 5. 1 Simulation Results 39 3. 5. 2 Simulated Schedule Results 43 3. 5. 3 Survey on the Boundary Condition of the Parallelization Freedom 45 3. 5. 4 Autonomous Driving Task Implementation Results 48 4. Conditionally Optimal Parallelization of Multi-Segment and DAG Tasks 56 4. 1 Introduction 56 4. 2 Multi-Segment Task Model 58 4. 3 Extension of Chwa-MS Schedulability Analysis 60 4. 3. 1 Chwa-MS Schedulability Analysis 60 4. 3. 2 Tolerance and Interference of Multi-Segment Tasks 62 4. 4 Assigning Parallelization Options to Multi-Segments 63 4. 4. 1 Parallelization Route 63 4. 4. 2 Assigning Parallelization Options to Multi-Segment Tasks 66 4. 4. 3 Time complexity of Algorithm2 69 4.5 DAG (Directed Acyclic Graph) Task Model 69 4. 6 Extension of Chwa-DAG Schedulability Analysis 73 4. 6. 1 Chwa-DAG Schedulability Analysis 73 4. 6. 2 Tolerance and Interference of DAG Tasks 78 4. 7 Assigning Parallelization Options to DAG Tasks 87 4. 7. 1 Parallelization Route for DAG Task Model 87 4. 7. 2 Assigning Parallelization Options to DAG Tasks 90 4. 7. 3 Time Complexity of Algorithm3 91 4. 8 Experiment Results: Multi-Segment Task Model 93 4. 9 Experiment Results: DAG Task Model 96 4. 9. 1 Simulation Results 97 4. 9. 2 Implementation Results 100 5 Conclusion 104 5. 1 Summary 104 5. 2 Future Work 105 6. References 106박

SNU Open Repository and Archive

Massively-Parallel Feature Selection for Big Data

Author: Borboudakis Giorgos
Christophides Vassilis
Katsogridakis Pavlos
Pratikakis Polyvios
Tsamardinos Ioannis
Publication venue
Publication date: 23/08/2017
Field of study

We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of

p

-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

A path-level exact parallelization strategy for sequential simulation

Author: Baeza Daniel
Herrero Zaragoza José Ramón
Ortiz Julian
Peredo Andrade Oscar Francisco
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Sequential Simulation is a well known method in geostatistical modelling. Following the Bayesian approach for simulation of conditionally dependent random events, Sequential Indicator Simulation (SIS) method draws simulated values for K categories (categorical case) or classes defined by K different thresholds (continuous case). Similarly, Sequential Gaussian Simulation (SGS) method draws simulated values from a multivariate Gaussian field. In this work, a path-level approach to parallelize SIS and SGS methods is presented. A first stage of re-arrangement of the simulation path is performed, followed by a second stage of parallel simulation for non-conflicting nodes. A key advantage of the proposed parallelization method is to generate identical realizations as with the original non-parallelized methods. Case studies are presented using two sequential simulation codes from GSLIB: SISIM and SGSIM. Execution time and speedup results are shown for large-scale domains, with many categories and maximum kriging neighbours in each case, achieving high speedup results in the best scenarios using 16 threads of execution in a single machine.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Repositorio Académico de la Universidad de Chile

Frequent itemset mining on multiprocessor systems

Author: Schlegel Benjamin
Publication venue
Publication date: 30/05/2013
Field of study

Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets

Technische Universität Dresden: Qucosa

Learning Gaussian graphical models with fractional marginal pseudo-likelihood

Author: Corander Jukka
Leppä-Aho Janne
Pensar Johan
Roos Teemu Teppo
Publication venue
Publication date: 25/02/2016
Field of study

We propose a Bayesian approximate inference method for learning the dependence structure of a Gaussian graphical model. Using pseudo-likelihood, we derive an analytical expression to approximate the marginal likelihood for an arbitrary graph structure without invoking any assumptions about decomposability. The majority of the existing methods for learning Gaussian graphical models are either restricted to decomposable graphs or require specification of a tuning parameter that may have a substantial impact on learned structures. By combining a simple sparsity inducing prior for the graph structures with a default reference prior for the model parameters, we obtain a fast and easily applicable scoring function that works well for even high-dimensional data. We demonstrate the favourable performance of our approach by large-scale comparisons against the leading methods for learning non-decomposable Gaussian graphical models. A theoretical justification for our method is provided by showing that it yields a consistent estimator of the graph structure. (C) 2017 Elsevier Inc. All rights reserved.Peer reviewe

arXiv.org e-Print Archive

Helsingin yliopiston digitaalinen arkisto

Novel Parallelization Techniques for Computer Graphics Applications

Author: Sanz Villafruela Diego
Publication venue
Publication date: 06/04/2021
Field of study

Increasingly complex and data-intensive algorithms in computer graphics applications require software engineers to find ways of improving performance and scalability to satisfy the requirements of customers and users. Parallelizing and tailoring each algorithm of each specific application is a time-consuming task and its implementation is domain-specific because it can not be reused outside the specific problem in which the algorithm is defined. Identifying reusable parallelization patterns that can be extrapolated and applied to other different algorithms is an essential task needed in order to provide consistent parallelization improvements and reduce the development time of evolving a sequential algorithm into a parallel one. This thesis focuses on defining general and efficient parallelization techniques and approaches that can be followed in order to parallelize complex 3D graphic algorithms. These parallelization patterns can be easily applied in order to convert most kinds of sequential complex and data-intensive algorithms to parallel ones obtaining consistent optimization results. The main idea in the thesis is to use multi-threading techniques to improve the parallelization and core utilization of 3D algorithms. Most of the 3D algorithms apply similar repetitive independent operations on a vast amount of 3D data. These application characteristics bring the opportunity of applying multi-thread parallelization techniques on such applications. The efficiency of the proposed idea is tested on two common computer graphics algorithms: hidden-line removal and collision detection. Both algorithms are data-intensive algorithms, whose conversions from a sequential to a multithread implementation introduce challenges, due to their complexities and the fact that elements in their data have different sizes and complexities, producing work-load imbalances and asymmetries between processing elements. The results show that the proposed principles and patterns can be easily applied to both algorithms, transforming their sequential to multithread implementations, obtaining consistent optimization results proportional to the number of processing elements. From the work done in this thesis, it is concluded that the suggested parallelization warrants further study and development in order to extend its usage to heterogeneous platforms such as a Graphical Processing Unit (GPU). OpenCL is the most feasible framework to explore in the future due to its interoperability among different platforms

UTUPub

Recommended from our members

Elixir: synthesis of parallel irregular algorithms

Author: Prountzos Dimitrios
Publication venue
Publication date: 11/02/2016
Field of study

Algorithms in new application areas like machine learning and data analytics usually operate on unstructured sparse graphs. Writing efficient parallel code to implement these algorithms is very challenging for a number of reasons. First, there may be many algorithms to solve a problem and each algorithm may have many implementations. Second, synchronization, which is necessary for correct parallel execution, introduces potential problems such as data-races and deadlocks. These issues interact in subtle ways, making the best solution dependent both on the parallel platform and on properties of the input graph. Consequently, implementing and selecting the best parallel solution can be a daunting task for non-experts, since we have few performance models for predicting the performance of parallel sparse graph programs on parallel hardware. This dissertation presents a synthesis methodology and a system, Elixir, that addresses these problems by (i) allowing programmers to specify solutions at a high level of abstraction, and (ii) generating many parallel implementations automatically and using search to find the best one. An Elixir specification consists of a set of operators capturing the main algorithm logic and a schedule specifying how to efficiently apply the operators. Elixir employs sophisticated automated reasoning to merge these two components, and uses techniques based on automated planning to insert synchronization and synthesize efficient parallel code. Experimental evaluation of our approach demonstrates that the performance of the Elixir generated code is competitive to, and can even outperform, hand-optimized code written by expert programmers for many interesting graph benchmarks.Computer Science

Texas ScholarWorks

QCBA: Postoptimization of Quantitative Attributes in Classifiers based on Association Rules

Author: Kliegr Tomas
Publication venue
Publication date: 18/10/2019
Field of study

The need to prediscretize numeric attributes before they can be used in association rule learning is a source of inefficiencies in the resulting classifier. This paper describes several new rule tuning steps aiming to recover information lost in the discretization of numeric (quantitative) attributes, and a new rule pruning strategy, which further reduces the size of the classification models. We demonstrate the effectiveness of the proposed methods on postoptimization of models generated by three state-of-the-art association rule classification algorithms: Classification based on Associations (Liu, 1998), Interpretable Decision Sets (Lakkaraju et al, 2016), and Scalable Bayesian Rule Lists (Yang, 2017). Benchmarks on 22 datasets from the UCI repository show that the postoptimized models are consistently smaller -- typically by about 50% -- and have better classification performance on most datasets

arXiv.org e-Print Archive

Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

Author: Hsieh Shang-Hsien
Publication venue
Publication date
Field of study

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system

NASA Technical Reports Server