5,236 research outputs found

    Towards hardware acceleration of neuroevolution for multimedia processing applications on mobile devices

    Get PDF
    This paper addresses the problem of accelerating large artificial neural networks (ANN), whose topology and weights can evolve via the use of a genetic algorithm. The proposed digital hardware architecture is capable of processing any evolved network topology, whilst at the same time providing a good trade off between throughput, area and power consumption. The latter is vital for a longer battery life on mobile devices. The architecture uses multiple parallel arithmetic units in each processing element (PE). Memory partitioning and data caching are used to minimise the effects of PE pipeline stalling. A first order minimax polynomial approximation scheme, tuned via a genetic algorithm, is used for the activation function generator. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design

    A methodology for exploiting parallelism in the finite element process

    Get PDF
    A methodology is described for developing a parallel system using a top down approach taking into account the requirements of the user. Substructuring, a popular technique in structural analysis, is used to illustrate this approach

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    Group implicit concurrent algorithms in nonlinear structural dynamics

    Get PDF
    During the 70's and 80's, considerable effort was devoted to developing efficient and reliable time stepping procedures for transient structural analysis. Mathematically, the equations governing this type of problems are generally stiff, i.e., they exhibit a wide spectrum in the linear range. The algorithms best suited to this type of applications are those which accurately integrate the low frequency content of the response without necessitating the resolution of the high frequency modes. This means that the algorithms must be unconditionally stable, which in turn rules out explicit integration. The most exciting possibility in the algorithms development area in recent years has been the advent of parallel computers with multiprocessing capabilities. So, this work is mainly concerned with the development of parallel algorithms in the area of structural dynamics. A primary objective is to devise unconditionally stable and accurate time stepping procedures which lend themselves to an efficient implementation in concurrent machines. Some features of the new computer architecture are summarized. A brief survey of current efforts in the area is presented. A new class of concurrent procedures, or Group Implicit algorithms is introduced and analyzed. The numerical simulation shows that GI algorithms hold considerable promise for application in coarse grain as well as medium grain parallel computers

    Resilience: Health in a New Key

    Get PDF
    This is the story of resilience, the remarkable capacity of individuals and communities to bounce back from adversity and even thrive in a world of turmoil and change. How we can begin to build on our strengths -- instead of becoming prisoners of our weaknesses -- is the subject of this issue brief

    Dynamic Systolization for Developing Multiprocessor Supercomputers

    Get PDF
    A dynamic network approach is introduced for developing reconfigurable, systolic arrays or wavefront processors; This allows one to design very powerful and flexible processors to be used in a general-purpose, reconfigurable, and fault-tolerant, multiprocessor computer system. The concepts of macro-dataflow and multitasking can be integrated to handle variable-resolution granularities in computationally intensive algorithms. A multiprocessor architecture, Remps, is proposed based on these design methodologies. The Remps architecture is generalized from the Cedar, HEP, Cray X- MP, Trac, NYU ultracomputer, S-l, Pumps, Chip, and SAM projects. Our goal is to provide a multiprocessor research model for developing design methodologies, multiprocessing and multitasking supports, dynamic systolic/wavefront array processors, interconnection networks, reconfiguration techniques, and performance analysis tools. These system design and operational techniques should be useful to those who are developing or evaluating multiprocessor supercomputers

    Author response: Effects of orthostatic hypotension on cognition in Parkinson disease

    Full text link
    OBJECTIVE: To investigate the relation between orthostatic hypotension (OH) and posture-mediated cognitive impairment in persons with Parkinson's disease (PD) without dementia. METHODS: There were 55 participants: 37 non-demented individuals with idiopathic PD, including 18 with OH (PDOH), and 19 without (PDWOH), and18 control participants (C). All participants completed neuropsychological tests in the supine and in the upright tilted position. Blood pressure was assessed in each posture using a standardized oscillometric cuff at the right brachial artery. RESULTS: The two PD groups performed similarly while supine, with a profile notable for executive dysfunction consisting of deficits in sustained attention, response inhibition, and semantic verbal fluency, as well as reduced verbal memory encoding and retention. When upright, these deficits were exacerbated and broadened to include additional cognitive functions in the PDOH group: deficits in phonemic verbal fluency, psychomotor speed, and both basic and complex aspects of auditory working memory. When group-specific supine scores were used as baseline anchors, both PD groups showed cognitive changes following tilt, though the PDOH group had a wider range of deficits in the executive functioning and memory domains and was the only group to show significant changes in visuospatial skills. CONCLUSIONS: Cognitive deficits in idiopathic PD have been widely reported, though assessments are typically performed in the supine position. While both PD groups had supine deficits that aligned with prior studies and clinical findings, we demonstrated that those with PD and orthostatic hypotension had transient, posture-mediated changes in excess of those found in PD without autonomic failure. These observed changes suggest an acute, reversible effect, and as orthostatic hypotension is a significant comorbid factor in PD, an independent target for clinical intervention. Further understanding of the effects of autonomic failure on cognition in other disorders is desirable, particularly in the context of neuroimaging studies and clinical assessments where data are collected only in the supine or seated positions. Identification of a distinct neuropsychological profile in PD with autonomic failure also has implications for functional activities of daily living and overall quality of life.Accepted manuscrip

    Systolic Array Implementations With Reduced Compute Time.

    Get PDF
    The goal of the research is the establishment of a formal methodology to develop computational structures more suitable for the changing nature of real-time signal processing and control applications. A major effort is devoted to the following question: Given a systolic array designed to execute a particular algorithm, what other algorithms can be executed on the same array? One approach for answering this question is based on a general model of array operations using graph-theoretic techniques. As a result, a systematic procedure is introduced that models array operations as a function of the compute cycle. As a consequence of the analysis, the dissertation develops the concept of fast algorithm realizations. This concept characterizes specific realizations that can be evaluated in a reduced number of cycles. It restricts the operations to remain in the same class but with reduced execution time. The concept takes advantage of the data dependencies of the algorithm at hand. This feature allows the modification of existing structures by reordering the input data. Applications of the principle allows optimum time band and triangular matrix product on arrays designed for dense matrices. A second approach for analyzing the families of algorithms implementable in an array, is based on the concept of array time constrained operation. The principle uses the number of compute cycle as an additional degree of freedom to expand the class of transformations generated by a single array. A mathematical approach, based on concepts from multilinear algebra, is introduced to model the recursive transformations implemented in linear arrays at each compute cycle. The proposed representation is general enough to encompass a large class of signal processing and control applications. A complete analytical model of the linear maps implementable by the array at each compute cycle is developed. The proposed methodology results in arrays that are more adaptable to the changing nature of operations. Lessons learned from analyzing existing arrays are used to design smart arrays for special algorithm realizations. Applications of the methodology include the design of flexible time structures and the ability to decompose a full size array into subarrays implementing smaller size problems

    Design and development of a novel Invasive Blood Pressure simulator for patient's monitor testing

    Get PDF
    This paper presents a newly-designed and realized Invasive Blood Pressure (IBP) device for the simulation on patient’s monitors. This device shows improvements and presents extended features with respect to a first prototype presented by the authors and similar systems available in the state-of-the-art. A peculiarity of the presented device is that all implemented features can be customized from the developer and from the point of view of the end user. The realized device has been tested, and its performances in terms of accuracy and of the back-loop measurement of the output for the blood pressure regulation utilization have been described. In particular, an accuracy of ±1 mmHg at 25 °C, on a range from −30 to 300 mmHg, was evaluated under different test conditions. The designed device is an ideal tool for testing IBP modules, for zero setting, and for calibrations. The implemented extended features, like the generation of custom waveforms and the Universal Serial Bus (USB) connectivity, allow use of this device in a wide range of applications, from research to equipment maintenance in clinical environments to educational purposes. Moreover, the presented device represents an innovation, both in terms of technology and methodologies: It allows quick and efficient tests to verify the proper functioning of IBP module of patients’ monitors. With this innovative device, tests can be performed directly in the field and faster procedures can be implemented by the clinical maintenance personnel. This device is an open source project and all materials, hardware, and software are fully available for interested developers or researchers.Web of Science201art. no. 25
    corecore