53 research outputs found

    Generating and auto-tuning parallel stencil codes

    Get PDF
    In this thesis, we present a software framework, Patus, which generates high performance stencil codes for different types of hardware platforms, including current multicore CPU and graphics processing unit architectures. The ultimate goals of the framework are productivity, portability (of both the code and performance), and achieving a high performance on the target platform. A stencil computation updates every grid point in a structured grid based on the values of its neighboring points. This class of computations occurs frequently in scientific and general purpose computing (e.g., in partial differential equation solvers or in image processing), justifying the focus on this kind of computation. The proposed key ingredients to achieve the goals of productivity, portability, and performance are domain specific languages (DSLs) and the auto-tuning methodology. The Patus stencil specification DSL allows the programmer to express a stencil computation in a concise way independently of hardware architecture-specific details. Thus, it increases the programmer productivity by disburdening her or him of low level programming model issues and of manually applying hardware platform-specific code optimization techniques. The use of domain specific languages also implies code reusability: once implemented, the same stencil specification can be reused on different hardware platforms, i.e., the specification code is portable across hardware architectures. Constructing the language to be geared towards a special purpose makes it amenable to more aggressive optimizations and therefore to potentially higher performance. Auto-tuning provides performance and performance portability by automated adaptation of implementation-specific parameters to the characteristics of the hardware on which the code will run. By automating the process of parameter tuning — which essentially amounts to solving an integer programming problem in which the objective function is the number representing the code's performance as a function of the parameter configuration, — the system can also be used more productively than if the programmer had to fine-tune the code manually. We show performance results for a variety of stencils, for which Patus was used to generate the corresponding implementations. The selection includes stencils taken from two real-world applications: a simulation of the temperature within the human body during hyperthermia cancer treatment and a seismic application. These examples demonstrate the framework's flexibility and ability to produce high performance code

    [Research activities in applied mathematics, fluid mechanics, and computer science]

    Get PDF
    This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science during the period April 1, 1995 through September 30, 1995

    VLSI Design

    Get PDF
    This book provides some recent advances in design nanometer VLSI chips. The selected topics try to present some open problems and challenges with important topics ranging from design tools, new post-silicon devices, GPU-based parallel computing, emerging 3D integration, and antenna design. The book consists of two parts, with chapters such as: VLSI design for multi-sensor smart systems on a chip, Three-dimensional integrated circuits design for thousand-core processors, Parallel symbolic analysis of large analog circuits on GPU platforms, Algorithms for CAD tools VLSI design, A multilevel memetic algorithm for large SAT-encoded problems, etc

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Technology 2003: The Fourth National Technology Transfer Conference and Exposition, volume 2

    Get PDF
    Proceedings from symposia of the Technology 2003 Conference and Exposition, Dec. 7-9, 1993, Anaheim, CA, are presented. Volume 2 features papers on artificial intelligence, CAD&E, computer hardware, computer software, information management, photonics, robotics, test and measurement, video and imaging, and virtual reality/simulation

    A Contribution Towards Intelligent Autonomous Sensors Based on Perovskite Solar Cells and Ta2O5/ZnO Thin Film Transistors

    Get PDF
    Many broad applications in the field of robotics, brain-machine interfaces, cognitive computing, image and speech processing and wearables require edge devices with very constrained power and hardware requirements that are challenging to realize. This is because these applications require sub-conscious awareness and require to be always “on”, especially when integrated with a sensor node that detects an event in the environment. Present day edge intelligent devices are typically based on hybrid CMOS-memristor arrays that have been so far designed for fast switching, typically in the range of nanoseconds, low energy consumption (typically in nano-Joules), high density and endurance (exceeding 1015 cycles). On the other hand, sensory-processing systems that have the same time constants and dynamics as their input signals, are best placed to learn or extract information from them. To meet this requirement, many applications are implemented using external “delay” in the memristor, in a process which enables each synapse to be modeled as a combination of a temporal delay and a spatial weight parameter. This thesis demonstrates a synaptic thin film transistor capable of inherent logic functions as well as compute-in-memory on similar time scales as biological events. Even beyond a conventional crossbar array architecture, we have relied on new concepts in reservoir computing to demonstrate a delay system reservoir with the highest learning efficiency of 95% reported to date, in comparison to equivalent two terminal memristors, using a single device for the task of image processing. The crux of our findings relied on enhancing our capability to model the unique physics of the device, in the scope of the current thesis, that is not amenable to conventional TCAD simulations. The model provides new insight into the redox characteristics of the gate current and paves way for assessment of device performance in compute-in-memory applications. The diffusion-based mechanism of the device, effectively enables time constants that have potential in applications such as gesture recognition and detection of cardiac arrythmia. The thesis also reports a new orientation of a solution processed perovskite solar cell with an efficiency of 14.9% that is easily integrable into an intelligent sensor node. We examine the influence of the growth orientation on film morphology and solar cell efficiency. Collectively, our work aids the development of more energy-efficient, powerful edge-computing sensor systems for upcoming applications of the IOT

    Understanding Quantum Technologies 2022

    Full text link
    Understanding Quantum Technologies 2022 is a creative-commons ebook that provides a unique 360 degrees overview of quantum technologies from science and technology to geopolitical and societal issues. It covers quantum physics history, quantum physics 101, gate-based quantum computing, quantum computing engineering (including quantum error corrections and quantum computing energetics), quantum computing hardware (all qubit types, including quantum annealing and quantum simulation paradigms, history, science, research, implementation and vendors), quantum enabling technologies (cryogenics, control electronics, photonics, components fabs, raw materials), quantum computing algorithms, software development tools and use cases, unconventional computing (potential alternatives to quantum and classical computing), quantum telecommunications and cryptography, quantum sensing, quantum technologies around the world, quantum technologies societal impact and even quantum fake sciences. The main audience are computer science engineers, developers and IT specialists as well as quantum scientists and students who want to acquire a global view of how quantum technologies work, and particularly quantum computing. This version is an extensive update to the 2021 edition published in October 2021.Comment: 1132 pages, 920 figures, Letter forma

    NASA Tech Briefs, November 1993

    Get PDF
    Topics covered: Advanced Manufacturing; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences; Life Sciences
    corecore