492 research outputs found

    Binary Neural Networks in FPGAs: Architectures, Tool Flows and Hardware Comparisons.

    Get PDF
    Binary neural networks (BNNs) are variations of artificial/deep neural network (ANN/DNN) architectures that constrain the real values of weights to the binary set of numbers {-1,1}. By using binary values, BNNs can convert matrix multiplications into bitwise operations, which accelerates both training and inference and reduces hardware complexity and model sizes for implementation. Compared to traditional deep learning architectures, BNNs are a good choice for implementation in resource-constrained devices like FPGAs and ASICs. However, BNNs have the disadvantage of reduced performance and accuracy because of the tradeoff due to binarization. Over the years, this has attracted the attention of the research community to overcome the performance gap of BNNs, and several architectures have been proposed. In this paper, we provide a comprehensive review of BNNs for implementation in FPGA hardware. The survey covers different aspects, such as BNN architectures and variants, design and tool flows for FPGAs, and various applications for BNNs. The final part of the paper gives some benchmark works and design tools for implementing BNNs in FPGAs based on established datasets used by the research community

    Background evaluation of the ANAIS dark matter experiment in different configurations: towards a final design

    Get PDF
    INTRODUCCIÓN Y MOTIVACIÓN DE LA TESIS Existen numerosas evidencias de que la mayor parte de la materia del Universo es "oscura", y diversos experimentos se han dedicado a su detección directa con diferentes blancos y técnicas. Hasta la fecha, sólo el experimento DAMA/LIBRA, en el Laboratorio Nacional del Gran Sasso, Italia, ha reivindicado una señal positiva de modulación anual como la que se espera que produzcan las partículas de materia oscura que constituyen el halo de nuestra galaxia, que no es compatible ni con fondos convencionales ni con los variados efectos sistemáticos propuestos. El proyecto ANAIS, desarrollado por la Universidad de Zaragoza, tiene como objetivo la confirmación de la señal positiva de DAMA/LIBRA usando el mismo blanco y técnica de detección. DESARROLLO TEÓRICO ANAIS-112 usará 112.5 kg de NaI(Tl) ultrapuro en nueve detectores. Se ha llevado a cabo un extenso trabajo para caracterizar y cuantificar las contribuciones al fondo radiactivo de los diferentes prototipos considerados. En este trabajo se presentan los resultados de los cuatro primeros detectores fabricados por la compañía Alpha Spectra (D0-D3), basados en medidas en el Laboratorio Subterráneo de Canfranc y simulaciones por Monte Carlo. CONCLUSIÓN Los modelos de fondo construidos para los detectores D0, D1 y D2, así como los resultados preliminares de D3, describen satisfactoriamente los datos medidos en diferentes condiciones de análisis. La inclusión de algunas hipótesis adicionales, como la presencia de isótopos cosmogénicos que no pueden ser cuantificados directamente o la contaminación parcial de 210Pb en la superficie, mejoran significativamente la concordancia del modelo con los datos reales. El detector D2 presenta el mejor fondo medido hasta la fecha, pero se espera mejorarlo todavía más en ANAIS-112 gracias al incremento del poder de rechazo en un montaje multimodular, que permita eliminar coincidencias, y un mayor control de la radiopureza de los módulos restantes. Además, se han evaluado las perspectivas de fondo para el experimento completo, teniendo en cuenta el diseño planeado para ANAIS-112, así como otros escenarios hipotéticos, como una matriz de detectores de NaI(Tl) correspondiente a 250 kg, o el uso de un veto de centelleo líquido. Finalmente, la sensibilidad de ANAIS en la búsqueda de la modulación anual de la señal de materia oscura ha sido evaluada en diferentes configuraciones experimentales y condiciones de fondo, confirmando que ANAIS-112 con el fondo ya obtenido en D2 podrá explorar la señal de DAMA/LIBRA. BIBLIOGRAFÍA - R. Bernabei, P. Belli, F. Cappella et al., Final model independent results of DAMA/LIBRA-phase 1 and perspectives of phase 2, Phys. Part. Nuclei 46, 138 (2015). - M. Martínez, Diseño de un prototipo para un experimento de detección directa de materia oscura mediante modulación anual con centelleadores de ioduro de sodio, Tesis Doctoral, Universidad de Zaragoza, 2006. - C. Cuesta et al., Analysis of the 40K contamination in NaI(Tl) crystals from different providers, Int. J. Mod. Phys. A 29, 1443010 (2014). - C. Cuesta et al., Bulk NaI(Tl) scintillation low energy events selection with the ANAIS-0 module, Eur. Phys. J. C 74, 3150 (2014). - C. Cuesta, ANAIS-0: Feasibility study for a 250 kg NaI(Tl) dark matter search experiment at the Canfranc Underground Laboratory, Tesis Doctoral, Universidad de Zaragoza (2013). - M.A. Oliván, Design, scale-up and characterization of the data acquisition system for the ANAIS dark matter experiment, Tesis Doctoral, Universidad de Zaragoza, (2016). - J. Amaré et al., Assessment of backgrounds of the ANAIS experiment for dark matter direct detection, Eur. Phys. J. C 76, 429 (2016)

    Development of New Approaches to ATLAS Detector Simulation and Dark Matter Searches with Trigger Level Analysis

    Get PDF
    Elementary particles and their interactions are successfully described by the Standard Model of particle physics (SM). However, it has been observed that extensions Beyond the Standard Model (BSM) are required to account for a large part of yet undiscovered particles and interactions, such as Dark Matter (DM).To advance the knowledge of the SM and to pursue DM discoveries, CERN has the ambitious plan of further increasing the Large Hadron Collider's (LHC) energy and luminosity, thus reaching unprecedented event rates in the field of collider physics.This thesis is divided in three parts, dealing with some of the most challenging aspects of the ATLAS experiment at the LHC present and future activities. After a thorough review of the SM, BSM physics is outlined, with particular attention to DM searches. The second part of this work addresses the issue of coping with the foreseen event rates of the High-Luminosity LHC (HL-LHC) phase. Indeed, optimizations of the existing Geant4 simulation codes are a crucial step to alleviate the need for new and expensive hardware resources. With the objective of improving the efficiency of the simulation tools, an extensive study on different compilers, different optimization levels and different build types is presented. In addition, a preliminary investigation on the geometry description of the ATLAS Transition Radiation Tracker (TRT) modules is discussed. The last part of the thesis covers the DM searches carried out by the ATLAS Trigger-object Level Analysis (TLA) group. These searches are based on the analysis of the invariant mass spectrum of di-jet events and, during LHC Run 2, have been performed at energies in the 450-1800 GeV range (integrated luminosity up to 29.3 fb-1 and center of mass energy of 13 TeV). After a review of the TLA studies, a preliminary investigation on the performance of Bayesian and Frequentist statistical tools is presented. In particular, the attention is focused on the interpretation and handling of systematic uncertainties both on background and DM signals. This is of particular importance in the process of finding localized excesses, which can indicate the existence of DM signals, and setting limits on the DM event cross sections

    Working Notes from the 1992 AAAI Workshop on Automating Software Design. Theme: Domain Specific Software Design

    Get PDF
    The goal of this workshop is to identify different architectural approaches to building domain-specific software design systems and to explore issues unique to domain-specific (vs. general-purpose) software design. Some general issues that cut across the particular software design domain include: (1) knowledge representation, acquisition, and maintenance; (2) specialized software design techniques; and (3) user interaction and user interface

    Holistic recommender systems for software engineering

    Get PDF
    The knowledge possessed by developers is often not sufficient to overcome a programming problem. Short of talking to teammates, when available, developers often gather additional knowledge from development artifacts (e.g., project documentation), as well as online resources. The web has become an essential component in the modern developer’s daily life, providing a plethora of information from sources like forums, tutorials, Q&A websites, API documentation, and even video tutorials. Recommender Systems for Software Engineering (RSSE) provide developers with assistance to navigate the information space, automatically suggest useful items, and reduce the time required to locate the needed information. Current RSSEs consider development artifacts as containers of homogeneous information in form of pure text. However, text is a means to represent heterogeneous information provided by, for example, natural language, source code, interchange formats (e.g., XML, JSON), and stack traces. Interpreting the information from a pure textual point of view misses the intrinsic heterogeneity of the artifacts, thus leading to a reductionist approach. We propose the concept of Holistic Recommender Systems for Software Engineering (H-RSSE), i.e., RSSEs that go beyond the textual interpretation of the information contained in development artifacts. Our thesis is that modeling and aggregating information in a holistic fashion enables novel and advanced analyses of development artifacts. To validate our thesis we developed a framework to extract, model and analyze information contained in development artifacts in a reusable meta- information model. We show how RSSEs benefit from a meta-information model, since it enables customized and novel analyses built on top of our framework. The information can be thus reinterpreted from an holistic point of view, preserving its multi-dimensionality, and opening the path towards the concept of holistic recommender systems for software engineering

    Caracterización y optimización térmica de sistemas en chip mediante emulación con FPGAs

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 15/06/2012Tablets and smartphones are some of the many intelligent devices that dominate the consumer electronics market. These systems are complex to design as they must execute multiple applications (e.g.: real-time video processing, 3D games, or wireless communications), while meeting additional design constraints, such as low energy consumption, reduced implementation size and, of course, a short time-to-market. Internally, they rely on Multi-processor Systems on Chip (MPSoCs) as their main processing cores, to meet the tight design constraints: performance, size, power consumption, etc. In a bad design, the high logic density may generate hotspots that compromise the chip reliability. This thesis introduces a FPGA-based emulation framework for easy exploration of SoC design alternatives. It provides fast and accurate estimations of performance, power, temperature, and reliability in one unified flow, to help designers tune their system architecture before going to silicon.El estado del arte, en lo que a diseño de chips para empotrados se refiere, se encuentra dominado por los multi-procesadores en chip, o MPSoCs. Son complejos de diseñar y presentan problemas de disipación de potencia, de temperatura, y de fiabilidad. En este contexto, esta tesis propone una nueva plataforma de emulación para facilitar la exploración del enorme espacio de diseño. La plataforma utiliza una FPGA de propósito general para acelerar la emulación, lo cual le da una ventaja competitiva frente a los simuladores arquitectónicos software, que son mucho más lentos. Los datos obtenidos de la ejecución en la FPGA son enviados a un PC que contiene bibliotecas (modelos) SW para calcular el comportamiento (e.g.: la temperatura, el rendimiento, etc...) que tendría el chip final. La parte experimental está enfocada a dos puntos: por un lado, a verificar que el sistema funciona correctamente y, por otro, a demostrar la utilidad del entorno para realizar exploraciones que muestren los efectos a largo plazo que suceden dentro del chip, como puede ser la evolución de la temperatura, que es un fenómeno lento que normalmente requiere de costosas simulaciones software.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    Low power architectures for streaming applications

    Get PDF

    Addressing Memory Bottlenecks for Emerging Applications

    Full text link
    There has been a recent emergence of applications from the domain of machine learning, data mining, numerical analysis and image processing. These applications are becoming the primary algorithms driving many important user-facing applications and becoming pervasive in our daily lives. Due to their increasing usage in both mobile and datacenter workloads, it is necessary to understand the software and hardware demands of these applications, and design techniques to match their growing needs. This dissertation studies the performance bottlenecks that arise when we try to improve the performance of these applications on current hardware systems. We observe that most of these applications are data-intensive, i.e., they operate on a large amount of data. Consequently, these applications put significant pressure on the memory. Interestingly, we notice that this pressure is not just limited to one memory structure. Instead, different applications stress different levels of the memory hierarchy. For example, training Deep Neural Networks (DNN), an emerging machine learning approach, is currently limited by the size of the GPU main memory. On the other spectrum, improving DNN inference on CPUs is bottlenecked by Physical Register File (PRF) bandwidth. Concretely, this dissertation tackles four such memory bottlenecks for these emerging applications across the memory hierarchy (off-chip memory, on-chip memory and physical register file), presenting hardware and software techniques to address these bottlenecks and improve the performance of the emerging applications. For on-chip memory, we present two scenarios where emerging applications perform at a sub-optimal performance. First, many applications have a large number of marginal bits that do not contribute to the application accuracy, wasting unnecessary space and transfer costs. We present ACME, an asymmetric compute-memory paradigm, that removes marginal bits from the memory hierarchy while performing the computation in full precision. Second, we tackle the contention in shared caches for these emerging applications that arise in datacenters where multiple applications can share the same cache capacity. We present ShapeShifter, a runtime system that continuously monitors the runtime environment, detects changes in the cache availability and dynamically recompiles the application on the fly to efficiently utilize the cache capacity. For physical register file, we observe that DNN inference on CPUs is primarily limited by the PRF bandwidth. Increasing the number of compute units in CPU requires increasing the read ports in the PRF. In this case, PRF quickly reaches a point where latency could no longer be met. To solve this problem, we present LEDL, locality extensions for deep learning on CPUs, that entails a rearchitected FMA and PRF design tailored for the heavy data reuse inherent in DNN inference. Finally, a significant challenge facing both the researchers and industry practitioners is that as the DNNs grow deeper and larger, the DNN training is limited by the size of the GPU main memory, restricting the size of the networks which GPUs can train. To tackle this challenge, we first identify the primary contributors to this heavy memory footprint, finding that the feature maps (intermediate layer outputs) are the heaviest contributors in training as opposed to the weights in inference. Then, we present Gist, a runtime system, that uses three efficient data encoding techniques to reduce the footprint of DNN training.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146016/1/anijain_1.pd

    A Peer Reviewed Newspaper About Research Refusal

    Get PDF
    This publication presents the outcome of an online workshop (organized by Digital Aesthetics Research Centre, Aarhus University; Centre for the Study of the Networked Image, London South Bank University; and transmediale festival, Berlin) with the participation of nine different groups located at different geographical locations, some inside and some outside the academy. Each group was selected on the basis of an open call and has taken part in a shared mailing list, creating a common list of references, and discussing strategies of refusal, and how these might relate to practices of research and its infrastructures: what might be refused, and in what ways; how might academic autonomy be preserved in the context of capitalist tech development, especially perhaps in the present context of online delivery and the need for alternatives to corporate platforms (e.g. Zoom, Teams, Skype, and the like); and how to refuse research itself, in its instrumental form? Following the workshop, each group has been asked to produce a section of this newspaper that in different ways represents the group’s abstractions on the subject. The design has been developed by Open Source Publishing, a collective renowned for a practice that questions the influence and affordance of digital tools in graphic design, and who works exclusively with free and open source software. The intention behind this publication has, in this way, been to explore the expanded possibilities of acting, sharing, and making, differently - beyond the normative production of research and its dissemination. Importantly, it has also been a means to allow emerging researchers to present their ideas to the wider community of the transmediale festival in an accessible form. The newspaper will be distributed at the festival’s various physical events in Berlin in the coming weeks, and is available for download here and over on the Digital Aesthetics Research Center website . You can also find extended versions of the participants research in APRJA , an open-access research journal that addresses digital culture

    Human factors in the design of parallel program performance tuning tools

    Get PDF
    corecore