106 research outputs found

    Per Aspera ad Astra: On the Way to Parallel Processing

    Get PDF
    Computational Science and Engineering is being established as a third category of scientific methodology; this innovative discipline supports and supplements the traditional categories: theory and experiment, in order to solve the problems arising from complex systems challenging science and technology. While the successes of the past two decades in scientific computing have been achieved essentially by the technical breakthrough of the vector-supercomputers, today the discussion about the future of supercomputing is focussed on massively parallel computers. The discrepancy, however, between peak performance and sustained performance achievable with algorithmic kernels, software packages, and real applications is still disappointingly high. An important issue are programming models. While Message Passing on parallel computers with distributed memory is the only efficient programming paradigm available today, from a user's point of view it is hard to imagine that this programming model, rather than Shared Virtual Memory, will be capable to serve as the central basis in order to bring computing on massively parallel systems from a sheer computer science trend to the technological breakthrough needed to deal with the large applications of the future; this is especially true for commercial applications where explicit programming the data communication via Message Passing may turn out to be a huge software-technological barrier which nobody might be willing to surmount.KFA Jülich is one of the largest big-science research centres in Europe; its scientific and engineering activities are ranging from fundamental research to applied science and technology. KFA's Central Institute for Applied Mathematics (ZAM) is running the large-scale computing facilities and network systems at KFA and is providing communication services, general-purpose and supercomputer capacity also to the HLRZ ("Höchstleistungsrechenzentrum") established in 1987 in order to further enhance and promote computational science in Germany. Thus, at KFA - and in particular enforced by ZAM - supercomputing has received high priority since more than ten years. What particle accelerators mean to experimental physics, supercomputers mean to Computational Science and Engineering: Supercomputers are the accelerators of theory

    Parallelization of the AVL FIRE Benchmark with SVM-Fortran

    Get PDF
    This article outlines the parallelization of an irregular grid application with SVM-Fortran. It describes the different optimizations and their effectiveness. The parallelization was much simplified by the performance analysis tool OPAL, a source code based tool for requesting and analyzing runtime performance data. Although shared memory parallelization is easier than distributed memory parallelization, understanding and eliminating the overhead from page faults is impossible without such a tool. It relates the page faults to the arrays and to the location in the source code. An area which is not supported by OPAL but where supporting tools are highly desirable, is the performance degradation due to low utilization of the on-chip cache

    A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing

    Full text link
    The past years have witnessed many dedicated open-source projects that built and maintain implementations of Support Vector Machines (SVM), parallelized for GPU, multi-core CPUs and distributed systems. Up to this point, no comparable effort has been made to parallelize the Elastic Net, despite its popularity in many high impact applications, including genetics, neuroscience and systems biology. The first contribution in this paper is of theoretical nature. We establish a tight link between two seemingly different algorithms and prove that Elastic Net regression can be reduced to SVM with squared hinge loss classification. Our second contribution is to derive a practical algorithm based on this reduction. The reduction enables us to utilize prior efforts in speeding up and parallelizing SVMs to obtain a highly optimized and parallel solver for the Elastic Net and Lasso. With a simple wrapper, consisting of only 11 lines of MATLAB code, we obtain an Elastic Net implementation that naturally utilizes GPU and multi-core CPUs. We demonstrate on twelve real world data sets, that our algorithm yields identical results as the popular (and highly optimized) glmnet implementation but is one or several orders of magnitude faster.Comment: 10 page

    RESEARCH ISSUES CONCERNING ALGORITHMS USED FOR OPTIMIZING THE DATA MINING PROCESS

    Get PDF
    In this paper, we depict some of the most widely used data mining algorithms that have an overwhelming utility and influence in the research community. A data mining algorithm can be regarded as a tool that creates a data mining model. After analyzing a set of data, an algorithm searches for specific trends and patterns, then defines the parameters of the mining model based on the results of this analysis. The above defined parameters play a significant role in identifying and extracting actionable patterns and detailed statistics. The most important algorithms within this research refer to topics like clustering, classification, association analysis, statistical learning, link mining. In the following, after a brief description of each algorithm, we analyze its application potential and research issues concerning the optimization of the data mining process. After the presentation of the data mining algorithms, we will depict the most important data mining algorithms included in Microsoft and Oracle software products, useful suggestions and criteria in choosing the most recommended algorithm for solving a mentioned task, advantages offered by these software products.data mining optimization, data mining algorithms, software solutions

    Improving Utility of GPU in Accelerating Industrial Applications with User-centred Automatic Code Translation

    Get PDF
    SMEs (Small and medium-sized enterprises), particularly those whose business is focused on developing innovative produces, are limited by a major bottleneck on the speed of computation in many applications. The recent developments in GPUs have been the marked increase in their versatility in many computational areas. But due to the lack of specialist GPU (Graphics processing units) programming skills, the explosion of GPU power has not been fully utilized in general SME applications by inexperienced users. Also, existing automatic CPU-to-GPU code translators are mainly designed for research purposes with poor user interface design and hard-to-use. Little attentions have been paid to the applicability, usability and learnability of these tools for normal users. In this paper, we present an online automated CPU-to-GPU source translation system, (GPSME) for inexperienced users to utilize GPU capability in accelerating general SME applications. This system designs and implements a directive programming model with new kernel generation scheme and memory management hierarchy to optimize its performance. A web-service based interface is designed for inexperienced users to easily and flexibly invoke the automatic resource translator. Our experiments with non-expert GPU users in 4 SMEs reflect that GPSME system can efficiently accelerate real-world applications with at least 4x and have a better applicability, usability and learnability than existing automatic CPU-to-GPU source translators

    Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

    Get PDF
    Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model

    Expert Programmer versus Parallelizing Compiler: A Comparative Study of Two Approaches for Distributed Shared Memory

    Get PDF
    This article critically examines current parallel programming practice and optimizing compiler development. The general strategies employed by compiler and programmer to optimize a Fortran program are described, and then illustrated for a specific case by applying them to a well-known scientific program, TRED2, using the KSR-1 as the target architecture. Extensive measurement is applied to the resulting versions of the program, which are compared with a version produced by a commercial optimizing compiler, KAP. The compiler strategy significantly outperforms KAP and does not fall far short of the performance achieved by the programmer. Following the experimental section each approach is critiqued by the other. Perceived flaws, advantages, and common ground are outlined, with an eye to improving both schemes

    Expert programmer versus parallelizing compiler: A comparative study of two approaches for distributed shared

    Get PDF
    ABSTRACT This article critically examines current parallel programming practice and optimizing compiler development. The general strategies employed by compiler and programmer to optimize a Fortran program are described, and then illustrated for a specific case by applying them to a well-known scientific program, TRED2, using the KSR-1 as the target architecture. Extensive measurement is applied to the resulting versions of the program, which are compared with a version produced by a commercial optimizing compiler, KAP. The compiler strategy significantly outperforms KAP and does not fall far short of the performance achieved by the programmer. Following the experimental section each approach is critiqued by the other. Perceived flaws, advantages, and common ground are outlined, with an eye to improving both schemes

    On the Super-computational Background of the Research Centre Jülich

    Get PDF
    KFA Jülich is one of the largest big-science research centres in Europe; its scientific and engineering activities are ranging from fundamental research to applied science and technology. KFA's Central Institute for Applied Mathematics (ZAM) is running the large-scale computing facilities and network systems at KFA and is providing communication services, general-purpose and supercomputer capacity also for the HLRZ ("Höchstleistungsrechenzentrum") established in 1987 in order to further enhance and promote computational science in Germany. Thus, at KFA - and in particular enforced by ZAM - supercomputing has received high priority since more than ten years. What particle accelerators mean to experimental physics, supercomputers mean to Computational Science and Engineering: Supercomputers are the accelerators of theory
    corecore