48 research outputs found

    New Logic-In-Memory Paradigms: An Architectural and Technological Perspective

    Get PDF
    Processing systems are in continuous evolution thanks to the constant technological advancement and architectural progress. Over the years, computing systems have become more and more powerful, providing support for applications, such as Machine Learning, that require high computational power. However, the growing complexity of modern computing units and applications has had a strong impact on power consumption. In addition, the memory plays a key role on the overall power consumption of the system, especially when considering data-intensive applications. These applications, in fact, require a lot of data movement between the memory and the computing unit. The consequence is twofold: Memory accesses are expensive in terms of energy and a lot of time is wasted in accessing the memory, rather than processing, because of the performance gap that exists between memories and processing units. This gap is known as the memory wall or the von Neumann bottleneck and is due to the different rate of progress between complementary metal-oxide semiconductor (CMOS) technology and memories. However, CMOS scaling is also reaching a limit where it would not be possible to make further progress. This work addresses all these problems from an architectural and technological point of view by: (1) Proposing a novel Configurable Logic-in-Memory Architecture that exploits the in-memory computing paradigm to reduce the memory wall problem while also providing high performance thanks to its flexibility and parallelism; (2) exploring a non-CMOS technology as possible candidate technology for the Logic-in-Memory paradigm

    Developing a support for FPGAs in the Controller parallel programming model

    Get PDF
    La computación heterogénea se presenta como la solución para conseguir supercomputadores cada vez más rápidos capaces de resolver problemas más grandes y complejos en diferentes áreas de conocimiento. Para ello, integra aceleradores con distintas arquitecturas capaces de explotar las características de los problemas desde distintos enfoques obteniendo, de este modo, un mayor rendimiento. Las FPGAs son hardware reconfigurable, i.e., es posible modificarlas después de su fabricación. Esto permite una gran flexibilidad y una máxima adaptación al problema en cuestión. Además, tienen un consumo energético muy bajo. Todas estas ventajas tienen el gran inconveniente de una más difícil programaci ón mediante los propensos a errores HDLs (Hardware Description Language), tales como Verilog o VHDL, y requisitos de conocimientos avanzados de electrónica digital. En los últimos años los principales fabricantes de FPGAs han enfocado sus esfuerzos en desarrollar herramientas HLS (High Level Synthesis) que permiten programarlas a través de lenguajes de programación de alto nivel estilo C. Esto ha favorecido su adopción por la comunidad HPC y su integración en los nuevos supercomputadores. Sin embargo, el programador aún tiene que ocuparse de aspectos como la gestión de colas de comandos, parámetros de lanzamiento o transferencias de datos. El modelo Controller es una librería que facilita la gestión de la coordinación, comunicación y los detalles de lanzamiento de los kernels en aceleradores hardware. Explota de forma transparente sus modelos de programación nativos, en concreto OpenCL y CUDA, y, por tanto, consigue un alto rendimiento independientemente del compilador. Permite al programador utilizar los distintos recursos hardware disponibles de forma combinada en entornos heterogéneos. Este trabajo extiende el modelo Controller mediante el desarrollo de un backend que permite la integración de FPGAs, manteniendo los cambios sobre la interfaz de usuario al mínimo. A través de los resultados experimentales se comprueba que se consigue una disminución del esfuerzo de programación significativa en comparación con la implementación nativa en OpenCL. Del mismo modo, se consigue un elevado solapamiento entre computación y comunicación y un sobrecoste por el uso de la librería despreciable.Heterogeneous computing appears to be the solution to achieve ever faster computers capable of solving bigger and more complex problems in difierent fields of knowledge. To that end, it integrates accelerators with difierent architectures capable of exploiting the features of problems from difierent perspectives thus achieving higher performance. FPGAs are reconfigurable hardware, i.e., it is possible to modify them after manufacture. This allows great flexibility and maximum adaptability to the given problem. In addition, they have low power consumption. All these advantages have the great objection of more dificult programming with the errorprone HDLs (Hardware Description Language), such as Verilog or VHDL, and the requirement of advanced knowledge of digital electronics. The main FPGA vendors have concentrated on developing HLS (High Level Synthesis) tools that allow to program them with C-like high level programming languages. This favoured their adoption by the HPC community and their integration in new supercomputers. However, the programmer still has to take care of aspects such as management of command queues, launching parameters or data transfers. The Controller model is a library to easily manage the coordination, communication and kernel launching details on hardware accelerators. It transparently exploits their native or vendor specific programming models, namely OpenCL and CUDA, thus enabling the potential performance obtained by using them in a compiler agnostic way. It is intended to enable the programmer to make use of the diferent available hardware resources in combination in heterogeneous environments. This work extends the Controller model through the development of a backend that allows the integration of FPGAs, keeping the changes over the user-facing interface to the minimum. The experimental results validate that a significant decrease in programming effort compared to the native OpenCL implementation is achieved. Similarly, high overlap of computation and communication and a negligible overhead due to the use of the library are attained.Grado en Ingeniería Informátic

    Exploring New Computing Paradigms for Data-Intensive Applications

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Domain-aware Genetic Algorithms for Hardware and Mapping Optimization for Efficient DNN Acceleration

    Get PDF
    The proliferation of AI across a variety of domains (vision, language, speech, recommendations, games) has led to the rise of domain-specific accelerators for deep learning. At design-time, these accelerators carefully architect the on-chip dataflow to maximize data reuse (over space and time) and size the hardware resources (PEs and buffers) to maximize performance and energy-efficiency, while meeting the chip’s area and power targets. At compile-time, the target Deep Neural Network (DNN) model is mapped over the accelerator. The mapping refers to tiling the computation and data (i.e., tensors) and scheduling them over the PEs and scratchpad buffers respectively, while honoring the microarchitectural constraints (number of PEs, buffer sizes, and dataflow). The design-space of valid hardware resource assignments for a given dataflow and the valid mappings for a given hardware is extremely large (~O(10^24)) per layer for state-of-the-art DNN models today. This makes exhaustive searches infeasible. Unfortunately, there can be orders of magnitude performance and energy-efficiency differences between an optimal and sub-optimal choice, making these decisions a crucial part of the entire design process. Moreover, manual tuning by domain experts become unprecedentedly challenged due to increased irregularity (due to neural architecture search) and sparsity of DNN models. This necessitate the existence of Map Space Exploration (MSE). In this thesis, our goal is to deliver a deep analysis of the MSE for DNN accelerators, propose different techniques to improve MSE, and generalize the MSE framework to a wider landscape (from mapping to HW-mapping co-exploration, from single-accelerator to multi-accelerator scheduling). As part of it, we discuss the correlation between hardware flexibility and the formed map space, formalized the map space representation by four mapping axes: tile, order, parallelism, and shape. Next, we develop dedicated exploration operators for these axes and use genetic algorithm framework to converge the solution. Next, we develop "sparsity-aware" technique to enable sparsity consideration in MSE and a "warm-start" technique to solve the search speed challenge commonly seen across learning-based search algorithms. Finally, we extend out MSE to support hardware and map space co-exploration and multi-accelerator scheduling.Ph.D

    Optimization Tools for ConvNets on the Edge

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Hardware / Software Architectural and Technological Exploration for Energy-Efficient and Reliable Biomedical Devices

    Get PDF
    Nowadays, the ubiquity of smart appliances in our everyday lives is increasingly strengthening the links between humans and machines. Beyond making our lives easier and more convenient, smart devices are now playing an important role in personalized healthcare delivery. This technological breakthrough is particularly relevant in a world where population aging and unhealthy habits have made non-communicable diseases the first leading cause of death worldwide according to international public health organizations. In this context, smart health monitoring systems termed Wireless Body Sensor Nodes (WBSNs), represent a paradigm shift in the healthcare landscape by greatly lowering the cost of long-term monitoring of chronic diseases, as well as improving patients' lifestyles. WBSNs are able to autonomously acquire biological signals and embed on-node Digital Signal Processing (DSP) capabilities to deliver clinically-accurate health diagnoses in real-time, even outside of a hospital environment. Energy efficiency and reliability are fundamental requirements for WBSNs, since they must operate for extended periods of time, while relying on compact batteries. These constraints, in turn, impose carefully designed hardware and software architectures for hosting the execution of complex biomedical applications. In this thesis, I develop and explore novel solutions at the architectural and technological level of the integrated circuit design domain, to enhance the energy efficiency and reliability of current WBSNs. Firstly, following a top-down approach driven by the characteristics of biomedical algorithms, I perform an architectural exploration of a heterogeneous and reconfigurable computing platform devoted to bio-signal analysis. By interfacing a shared Coarse-Grained Reconfigurable Array (CGRA) accelerator, this domain-specific platform can achieve higher performance and energy savings, beyond the capabilities offered by a baseline multi-processor system. More precisely, I propose three CGRA architectures, each contributing differently to the maximization of the application parallelization. The proposed Single, Multi and Interleaved-Datapath CGRA designs allow the developed platform to achieve substantial energy savings of up to 37%, when executing complex biomedical applications, with respect to a multi-core-only platform. Secondly, I investigate how the modeling of technology reliability issues in logic and memory components can be exploited to adequately adjust the frequency and supply voltage of a circuit, with the aim of optimizing its computing performance and energy efficiency. To this end, I propose a novel framework for workload-dependent Bias Temperature Instability (BTI) impact analysis on biomedical application results quality. Remarkably, the framework is able to determine the range of safe circuit operating frequencies without introducing worst-case guard bands. Experiments highlight the possibility to safely raise the frequency up to 101% above the maximum obtained with the classical static timing analysis. Finally, through the study of several well-known biomedical algorithms, I propose an approach allowing energy savings by dynamically and unequally protecting an under-powered data memory in a new way compared to regular error protection schemes. This solution relies on the Dynamic eRror compEnsation And Masking (DREAM) technique that reduces by approximately 21% the energy consumed by traditional error correction codes

    An investigation into computer and network curricula

    Get PDF
    This thesis consists of a series of internationally published, peer reviewed, journal and conference research papers that analyse the educational and training needs of undergraduate Information Technology (IT) students within the area of Computer and Network Technology (CNT) Education. Research by Maj et al has found that accredited computing science curricula can fail to meet the expectations of employers in the field of CNT: “It was found that none of these students could perform first line maintenance on a Personal Computer (PC) to a professional standard with due regard to safety, both to themselves and the equipment. Neither could they install communication cards, cables and network operating system or manage a population of networked PCs to an acceptable commercial standard without further extensive training. It is noteworthy that none of the students interviewed had ever opened a PC. It is significant that all those interviewed for this study had successfully completed all the units on computer architecture and communication engineering (Maj, Robbins, Shaw, & Duley, 1998). The students\u27 curricula at that time lacked units in which they gained hands-on experience in modern PC hardware or networking skills. This was despite the fact that their computing science course was level one accredited, the highest accreditation level offered by the Australian Computer Society (ACS). The results of the initial survey in Western Australia led to the introduction of two new units within the Computing Science Degree at Edith Cowan University (ECU), Computer Installation & Maintenance (CIM) and Network Installation & Maintenance (NIM) (Maj, Fetherston, Charlesworth, & Robbins, 1998). Uniquely within an Australian university context these new syllabi require students to work on real equipment. Such experience excludes digital circuit investigation, which is still a recommended approach by the Association for Computing Machinery (ACM) for computer architecture units (ACM, 2001, p.97). Instead, the CIM unit employs a top-down approach based initially upon students\u27 everyday experiences, which is more in accordance with constructivist educational theory and practice. These papers propose an alternate model of IT education that helps to accommodate the educational and vocational needs of IT students in the context of continual rapid changes and developments in technology. The ACM have recognised the need for variation noting that: There are many effective ways to organize a curriculum even for a particular set of goals and objectives (Tucker et al., 1991, p.70). A possible major contribution to new knowledge of these papers relates to how high level abstract bandwidth (B-Node) models may contribute to the understanding of why and how computer and networking technology systems have developed over time. Because these models are de-coupled from the underlying technology, which is subject to rapid change, these models may help to future-proof student knowledge and understanding of the ongoing and future development of computer and networking systems. The de-coupling is achieved through abstraction based upon bandwidth or throughput rather than the specific implementation of the underlying technologies. One of the underlying problems is that computing systems tend to change faster than the ability of most educational institutions to respond. Abstraction and the use of B-Node models could help educational models to more quickly respond to changes in the field, and can also help to introduce an element of future-proofing in the education of IT students. The importance of abstraction has been noted by the ACM who state that: Levels of Abstraction: the nature and use of abstraction in computing; the use of abstraction in managing complexity, structuring systems, hiding details, and capturing recurring patterns; the ability to represent an entity or system by abstractions having different levels of detail and specificity (ACM, 1991b). Bloom et al note the importance of abstraction, listing under a heading of: “Knowledge of the universals and abstractions in a field” the objective: Knowledge of the major schemes and patterns by which phenomena and ideas arc organized. These are large structures, theories, and generalizations which dominate a subject or field or problems. These are the highest levels of abstraction and complexity\u27\u27 (Bloom, Engelhart, Furst, Hill, & Krathwohl, 1956, p. 203). Abstractions can be applied to computer and networking technology to help provide students with common fundamental concepts regardless of the particular underlying technological implementation to help avoid the rapid redundancy of a detailed knowledge of modem computer and networking technology implementation and hands-on skills acquisition. Again the ACM note that: “Enduring computing concepts include ideas that transcend any specific vendor, package or skill set... While skills are fleeting, fundamental concepts are enduring and provide long lasting benefits to students, critically important in a rapidly changing discipline (ACM, 2001, p.70) These abstractions can also be reinforced by experiential learning to commercial practices. In this context, the other possibly major contribution of new knowledge provided by this thesis is an efficient, scalable and flexible model for assessing hands-on skills and understanding of IT students. This is a form of Competency-Based Assessment (CBA), which has been successfully tested as part of this research and subsequently implemented at ECU. This is the first time within this field that this specific type of research has been undertaken within the university sector within Australia. Hands-on experience and understanding can become outdated hence the need for future proofing provided via B-Nodes models. The three major research questions of this study are: •Is it possible to develop a new, high level abstraction model for use in CNT education? •Is it possible to have CNT curricula that are more directly relevant to both student and employer expectations without suffering from rapid obsolescence? •Can WI effective, efficient and meaningful assessment be undertaken to test students\u27 hands-on skills and understandings? The ACM Special Interest Group on Data Communication (SJGCOMM) workshop report on Computer Networking, Curriculum Designs and Educational Challenges, note a list of teaching approaches: ... the more \u27hands-on\u27 laboratory approach versus the more traditional in-class lecture-based approach; the bottom-up approach towards subject matter verus the top-down approach (Kurose, Leibeherr, Ostermann, & Ott-Boisseau, 2002, para 1). Bandwidth considerations are approached from the PC hardware level and at each of the seven layers of the International Standards Organisation (ISO) Open Systems Interconnection (OSI) reference model. It is believed that this research is of significance to computing education. However, further research is needed

    Energy: A continuing bibliography with indexes, issue 39

    Get PDF
    This bibliography lists 1377 reports, articles and other documents introduced into the NASA scientific and technical information system from July 1, 1983 through September 30, 1983

    Hardware/Software Co-Design of Ultra-Low Power Biomedical Monitors

    Get PDF
    Ongoing changes in world demographics and the prevalence of unhealthy lifestyles are imposing a paradigm shift in healthcare delivery. Nowadays, chronic ailments such as cardiovascular diseases, hypertension and diabetes, represent the most common causes of death according to the World Health Organization. It is estimated that 63% of deaths worldwide are directly or indirectly related to these non-communicable diseases (NCDs), and by 2030 it is predicted that the health delivery cost will reach an amount comparable to 75% of the current GDP. In this context, technologies based on Wireless Sensor Nodes (WSNs) effectively alleviate this burden enabling the conception of wearable biomedical monitors composed of one or several devices connected through a Wireless Body Sensor Network (WBSN). Energy efficiency is of paramount importance for these devices, which must operate for prolonged periods of time with a single battery charge. In this thesis I propose a set of hardware/software co-design techniques to drastically increase the energy efficiency of bio-medical monitors. To this end, I jointly explore different alternatives to reduce the required computational effort at the software level while optimizing the power consumption of the processing hardware by employing ultra-low power multi-core architectures that exploit DSP application characteristics. First, at the sensor level, I study the utilization of a heartbeat classifier to perform selective advanced DSP on state-of-the-art ECG bio-medical monitors. To this end, I developed a framework to design and train real-time, lightweight heartbeat neuro-fuzzy classifiers, detail- ing the required optimizations to efficiently execute them on a resource-constrained platform. Then, at the network level I propose a more complex transmission-aware WBSN for activity monitoring that provides different tradeoffs between classification accuracy and transmission volume. In this work, I study the combination of a minimal set of WSNs with a smartphone, and propose two classification schemes that trade accuracy for transmission volume. The proposed method can achieve accuracies ranging from 88% to 97% and can save up to 86% of wireless transmissions, outperforming the state-of-the-art alternatives. Second, I propose a synchronization-based low-power multi-core architecture for bio-signal processing. I introduce a hardware/software synchronization mechanism that allows to achieve high energy efficiency while parallelizing the execution of multi-channel DSP applications. Then, I generalize the methodology to support bio-signal processing applications with an arbitrarily high degree of parallelism. Due to the benefits of SIMD execution and software pipelining, the architecture can reduce its power consumption by up 38% when compared to an equivalent low-power single-core alternative. Finally, I focused on the optimization of the multi-core memory subsystem, which is the major contributor to the overall system power consumption. First I considered a hybrid memory subsystem featuring a small reliable partition that can operate at ultra-low voltage enabling low-power buffering of data and obtaining up to 50% energy savings. Second, I explore a two-level memory hierarchy based on non-volatile memories (NVM) that allows for aggressive fine-grained power gating enabled by emerging low-power NVM technologies and monolithic 3D integration. Experimental results show that, by adopting this memory hierarchy, power consumption can be reduced by 5.42x in the DSP stage
    corecore