4,626 research outputs found

    The LOFAR Transients Pipeline

    Get PDF
    Current and future astronomical survey facilities provide a remarkably rich opportunity for transient astronomy, combining unprecedented fields of view with high sensitivity and the ability to access previously unexplored wavelength regimes. This is particularly true of LOFAR, a recently-commissioned, low-frequency radio interferometer, based in the Netherlands and with stations across Europe. The identification of and response to transients is one of LOFAR's key science goals. However, the large data volumes which LOFAR produces, combined with the scientific requirement for rapid response, make automation essential. To support this, we have developed the LOFAR Transients Pipeline, or TraP. The TraP ingests multi-frequency image data from LOFAR or other instruments and searches it for transients and variables, providing automatic alerts of significant detections and populating a lightcurve database for further analysis by astronomers. Here, we discuss the scientific goals of the TraP and how it has been designed to meet them. We describe its implementation, including both the algorithms adopted to maximize performance as well as the development methodology used to ensure it is robust and reliable, particularly in the presence of artefacts typical of radio astronomy imaging. Finally, we report on a series of tests of the pipeline carried out using simulated LOFAR observations with a known population of transients.Comment: 30 pages, 11 figures; Accepted for publication in Astronomy & Computing; Code at https://github.com/transientskp/tk

    Java Grande Forum Report: Making Java Work for High-End Computing

    Get PDF
    This document describes the Java Grande Forum and includes its initial deliverables.Theseare reports that convey a succinct set of recommendations from this forum to SunMicrosystems and other purveyors of Java™ technology that will enable GrandeApplications to be developed with the Java programming language

    Investigating Single Precision Floating General Matrix Multiply in Heterogeneous Hardware

    Get PDF
    The fundamental operation of matrix multiplication is ubiquitous across a myriad of disciplines. Yet, the identification of new optimizations for matrix multiplication remains relevant for emerging hardware architectures and heterogeneous systems. Frameworks such as OpenCL enable computation orchestration on existing systems, and its availability using the Intel High Level Synthesis compiler allows users to architect new designs for reconfigurable hardware using C/C++. Using the HARPv2 as a vehicle for exploration, we investigate the utility of several of the most notable matrix multiplication optimizations to better understand the performance portability of OpenCL and the implications for such optimizations on this and future heterogeneous architectures. Our results give targeted insights into the applicability of best practices that were for existing architectures when used on emerging heterogeneous systems

    An FPGA implementation of an investigative many-core processor, Fynbos : in support of a Fortran autoparallelising software pipeline

    Get PDF
    Includes bibliographical references.In light of the power, memory, ILP, and utilisation walls facing the computing industry, this work examines the hypothetical many-core approach to finding greater compute performance and efficiency. In order to achieve greater efficiency in an environment in which Moore’s law continues but TDP has been capped, a means of deriving performance from dark and dim silicon is needed. The many-core hypothesis is one approach to exploiting these available transistors efficiently. As understood in this work, it involves trading in hardware control complexity for hundreds to thousands of parallel simple processing elements, and operating at a clock speed sufficiently low as to allow the efficiency gains of near threshold voltage operation. Performance is there- fore dependant on exploiting a new degree of fine-grained parallelism such as is currently only found in GPGPUs, but in a manner that is not as restrictive in application domain range. While removing the complex control hardware of traditional CPUs provides space for more arithmetic hardware, a basic level of control is still required. For a number of reasons this work chooses to replace this control largely with static scheduling. This pushes the burden of control primarily to the software and specifically the compiler, rather not to the programmer or to an application specific means of control simplification. An existing legacy tool chain capable of autoparallelising sequential Fortran code to the degree of parallelism necessary for many-core exists. This work implements a many-core architecture to match it. Prototyping the design on an FPGA, it is possible to examine the real world performance of the compiler-architecture system to a greater degree than simulation only would allow. Comparing theoretical peak performance and real performance in a case study application, the system is found to be more efficient than any other reviewed, but to also significantly under perform relative to current competing architectures. This failing is apportioned to taking the need for simple hardware too far, and an inability to implement static scheduling mitigating tactics due to lack of support for such in the compiler

    Fortran refactoring for legacy systems

    Get PDF
    The motivation of this work comes from a Global Climate Model (GCM) Software which was in great need of being updated. This software was implemented by scientists in the ’80s as a result of meteorological research. Written in Fortran 77, this program has been used as an input to make climate predictions for the Southern Hemisphere. The execution to get a complete numerical data set takes several days. This software has been programmed using a sequential processing paradigm. In these days, where multicore processors are so widespread, the time that an execution takes to get a complete useful data set can be drastically reduced using this technology. As a first objective to reach this goal of reengineering we must be able to understand the source code. An essential Fortran code characteristic is that old source code versions became unreadable, not comprehensive and sometimes “ejects” the reader from the source code. In that way, we can not modify, update or improve unreadable source code. Then, as a first step to parallelize this code we must update it, turn it readable and easy to understand. The GCM has a very complex internal structure. The program is divided into about 300 .f (Fortran 77) files. These files generally implement only one Fortran subroutine. Less than 10% of the files are used for common blocks and constants. Approximately 25% of the lines in the source code are comments. The total number of Fortran source code lines is 58000. A detailed work within the source code brings to light that [74]: 1 About 230 routines are called/used at run time. Most of the runtime is spent in routines located at deep levels 5 to 7 in the dynamic call graph from the main routine. 2 The routine with most of the runtime (the top routine from now on) requires more than 9% of the total program runtime and is called about 315000 times. 3 The top 10 routines (the 10 routines at the top of the flat profile) require about 50% of total runtime. Two of them are related to intrinsic Fortran functions. Our first approach was using a scripting language and Find & Replace tools trying to upgrade the source code, this kind of code manipulation do not guarantee preservation of software behavior. Then, our goal was to develop an automated tool to transform legacy software in more understandable, comprehensible and readable applying refactoring as main technique. At the same time a catalog of transformation to be applied in Fortran code is needed as a guide to programmers through this process.Es revisado por: http://sedici.unlp.edu.ar/handle/10915/9703Facultad de Informátic

    Video Processing Acceleration using Reconfigurable Logic and Graphics Processors

    No full text
    A vexing question is `which architecture will prevail as the core feature of the next state of the art video processing system?' This thesis examines the substitutive and collaborative use of the two alternatives of the reconfigurable logic and graphics processor architectures. A structured approach to executing architecture comparison is presented - this includes a proposed `Three Axes of Algorithm Characterisation' scheme and a formulation of perfor- mance drivers. The approach is an appealing platform for clearly defining the problem, assumptions and results of a comparison. In this work it is used to resolve the advanta- geous factors of the graphics processor and reconfigurable logic for video processing, and the conditions determining which one is superior. The comparison results prompt the exploration of the customisable options for the graphics processor architecture. To clearly define the architectural design space, the graphics processor is first identifed as part of a wider scope of homogeneous multi-processing element (HoMPE) architectures. A novel exploration tool is described which is suited to the investigation of the customisable op- tions of HoMPE architectures. The tool adopts a systematic exploration approach and a high-level parameterisable system model, and is used to explore pre- and post-fabrication customisable options for the graphics processor. A positive result of the exploration is the proposal of a reconfigurable engine for data access (REDA) to optimise graphics processor performance for video processing-specific memory access patterns. REDA demonstrates the viability of the use of reconfigurable logic as collaborative `glue logic' in the graphics processor architecture

    An embedded adaptive optics real time controller

    Get PDF
    The design and realisation of a low cost, high speed control system for adaptive optics (AO) is presented. This control system is built around a field programmable gate array (FPGA). FPGA devices represent a fundamentally different approach to implementing control systems than conventional central processing units. The performance of the FPGA control system is demonstrated in a specifically constructed laboratory AO experiment where closed loop AO correction is shown. An alternative application of the control system is demonstrated in the field of optical tweezing, where it is used to study the motion dynamics of particles trapped within laser foci
    • …
    corecore