678 research outputs found

    FPGA Acceleration of Domain-specific Kernels via High-Level Synthesis

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems

    Accelerating Pattern Recognition Algorithms On Parallel Computing Architectures

    Get PDF
    The move to more parallel computing architectures places more responsibility on the programmer to achieve greater performance. The programmer must now have a greater understanding of the underlying architecture and the inherent algorithmic parallelism. Using parallel computing architectures for exploiting algorithmic parallelism can be a complex task. This dissertation demonstrates various techniques for using parallel computing architectures to exploit algorithmic parallelism. Specifically, three pattern recognition (PR) approaches are examined for acceleration across multiple parallel computing architectures, namely field programmable gate arrays (FPGAs) and general purpose graphical processing units (GPGPUs). Phase-only filter correlation for fingerprint identification was studied as the first PR approach. This approach\u27s sensitivity to angular rotations, scaling, and missing data was surveyed. Additionally, a novel FPGA implementation of this algorithm was created using fixed point computations, deep pipelining, and four computation phases. Communication and computation were overlapped to efficiently process large fingerprint galleries. The FPGA implementation showed approximately a 47 times speedup over a central processing unit (CPU) implementation with negligible impact on precision. For the second PR approach, a spiking neural network (SNN) algorithm for a character recognition application was examined. A novel FPGA implementation of the approach was developed incorporating a scalable modular SNN processing element (PE) to efficiently perform neural computations. The modular SNN PE incorporated streaming memory, fixed point computation, and deep pipelining. This design showed speedups of approximately 3.3 and 8.5 times over CPU implementations for 624 and 9,264 sized neural networks, respectively. Results indicate that the PE design could scale to process larger sized networks easily. Finally for the third PR approach, cellular simultaneous recurrent networks (CSRNs) were investigated for GPGPU acceleration. Particularly, the applications of maze traversal and face recognition were studied. Novel GPGPU implementations were developed employing varying quantities of task-level, data-level, and instruction-level parallelism to achieve efficient runtime performance. Furthermore, the performance of the face recognition application was examined across a heterogeneous cluster of multi-core and GPGPU architectures. A combination of multi-core processors and GPGPUs achieved roughly a 996 times speedup over a single-core CPU implementation. From examining these PR approaches for acceleration, this dissertation presents useful techniques and insight applicable to other algorithms to improve performance when designing a parallel implementation

    Data Mining and Machine Learning in Astronomy

    Full text link
    We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

    Implementación paralela del Automatic Target Detection and Classification Algorithm haciendo uso de la ortogonalización de Gram Schmidt para el análisis de imágenes hiperespectrales

    Get PDF
    La observación remota de la Tierra ha sido siempre objeto de interés para el ser humano. A lo largo de los años los métodos empleados con ese fin han ido evolucionando hasta que, en la actualidad, el análisis de imágenes hiperespectrales constituye una línea de investigación muy activa, en especial para realizar la monitorización y el seguimiento de incendios o prevenir y hacer un seguimiento de desastres naturales, vertidos químicos u otros tipos de contaminación ambiental. Debido a la forma en que aparecen los materiales en el entorno natural, es muy habitual que cohabiten materiales diferentes en una misma porción de espacio, por pequeña que sea ésta. Ello hace que la gran mayoría de los píxeles analizados no siempre estén constituidos por la presencia de un único material, sino que estén formados por distintos materiales puros a nivel de subpíxel. Tradicionalmente, se utilizan para su análisis técnicas de desmezclado espectral que precisan de dos etapas complejas: la primera se basa en la extracción de firmas espectrales puras, también llamadas endmembers, y la segunda consiste en estimar el porcentaje de abundancia de dichos endmembers a nivel de subpíxel. Ambas etapas implican un alto coste computacional y esto supone un problema cuando se desea analizar imágenes hiperespectrales en tiempo real. Los algoritmos de clasificación y detección de objetivos (targets) tienen unos principios de funcionamiento muy similares a los algoritmos de detección de endmembers y por tanto, habitualmente se utilizan para este fin a pesar de su alto coste computacional. Las FPGAs (Field Programmable Gate Array) ofrecen suficiente rendimiento para realizar este proce samiento además de flexibilidad, pequeño tamaño y bajo consumo. Todo ello sumado a que pueden ser endurecidas para su uso en el espacio hacen de ésta una opción muy viable para el objetivo que se persigue. En este trabajo de fin de grado se lleva a cabo la implementación en FPGA del algoritmo de detección y clasificación de objetivos conocido como ATDCA-GS (Automatic Target Detection and Classification Algorithm - Gram Schmidt), que utiliza el concepto de proyección ortogonal de un subespacio, haciendo uso de la ortogonalización de Gram Schmidt para simplificar operaciones complejas. Para la consecución de dicho TFG, se ha realizado la implementación del algoritmo en el lenguaje de descripción hardware VHDL (Very High Speed Integrated Circuit Hardware Description Language) y además, se ha analizado y optimizado otra implementación previa bajo el paradigma de programación paralela OpenCL. Posteriormente, se han comparado ambas implementaciones optimizadas en términos de precisión y rendimiento sobre plataformas de hardware reconfigurable tipo FPGAs
    corecore