9,938 research outputs found

    Mira: A Framework for Static Performance Analysis

    Full text link
    The performance model of an application can pro- vide understanding about its runtime behavior on particular hardware. Such information can be analyzed by developers for performance tuning. However, model building and analyzing is frequently ignored during software development until perfor- mance problems arise because they require significant expertise and can involve many time-consuming application runs. In this paper, we propose a fast, accurate, flexible and user-friendly tool, Mira, for generating performance models by applying static program analysis, targeting scientific applications running on supercomputers. We parse both the source code and binary to estimate performance attributes with better accuracy than considering just source or just binary code. Because our analysis is static, the target program does not need to be executed on the target architecture, which enables users to perform analysis on available machines instead of conducting expensive exper- iments on potentially expensive resources. Moreover, statically generated models enable performance prediction on non-existent or unavailable architectures. In addition to flexibility, because model generation time is significantly reduced compared to dynamic analysis approaches, our method is suitable for rapid application performance analysis and improvement. We present several scientific application validation results to demonstrate the current capabilities of our approach on small benchmarks and a mini application

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    Phase Synchronization Operator for On-Chip Brain Functional Connectivity Computation

    Get PDF
    This paper presents an integer-based digital processor for the calculation of phase synchronization between two neural signals. It is based on the measurement of time periods between two consecutive minima. The simplicity of the approach allows for the use of elementary digital blocks, such as registers, counters, and adders. The processor, fabricated in a 0.18- μ m CMOS process, only occupies 0.05 mm 2 and consumes 15 nW from a 0.5 V supply voltage at a signal input rate of 1024 S/s. These low-area and low-power features make the proposed processor a valuable computing element in closed-loop neural prosthesis for the treatment of neural disorders, such as epilepsy, or for assessing the patterns of correlated activity in neural assemblies through the evaluation of functional connectivity maps.Ministerio de Economía y Competitividad TEC2016-80923-POffice of Naval Research (USA) N00014-19-1-215

    Solid State Television Camera (CID)

    Get PDF
    The design, development and test are described of a charge injection device (CID) camera using a 244x248 element array. A number of video signal processing functions are included which maximize the output video dynamic range while retaining the inherently good resolution response of the CID. Some of the unique features of the camera are: low light level performance, high S/N ratio, antiblooming, geometric distortion, sequential scanning and AGC

    An intelligent allocation algorithm for parallel processing

    Get PDF
    The problem of allocating nodes of a program graph to processors in a parallel processing architecture is considered. The algorithm is based on critical path analysis, some allocation heuristics, and the execution granularity of nodes in a program graph. These factors, and the structure of interprocessor communication network, influence the allocation. To achieve realistic estimations of the executive durations of allocations, the algorithm considers the fact that nodes in a program graph have to communicate through varying numbers of tokens. Coarse and fine granularities have been implemented, with interprocessor token-communication duration, varying from zero up to values comparable to the execution durations of individual nodes. The effect on allocation of communication network structures is demonstrated by performing allocations for crossbar (non-blocking) and star (blocking) networks. The algorithm assumes the availability of as many processors as it needs for the optimal allocation of any program graph. Hence, the focus of allocation has been on varying token-communication durations rather than varying the number of processors. The algorithm always utilizes as many processors as necessary for the optimal allocation of any program graph, depending upon granularity and characteristics of the interprocessor communication network

    On-Orbit Validation of a Framework for Spacecraft-Initiated Communication Service Requests with NASA's SCaN Testbed

    Get PDF
    We design, analyze, and experimentally validate a framework for demand-based allocation of high-performance space communication service in which the user spacecraft itself initiates a request for service. Leveraging machine-to-machine communications, the automated process has potential to improve the responsiveness and efficiency of space network operations. We propose an augmented ground station architecture in which a hemispherical-pattern antenna allows for reception of service requests sent from any user spacecraft within view. A suite of ground-based automation software acts upon these direct-to-Earth requests and allocates access to high-performance service through a ground station or relay satellite in response to immediate user demand. A software-defined radio transceiver, optimized for reception of weak signals from the helical antenna, is presented. Design and testing of signal processing equipment and a software framework to handle service requests is discussed. Preliminary results from on-orbit demonstrations with a testbed onboard the International Space Station are presented to verify feasibility of the concept
    corecore