Search CORE

301 research outputs found

Runtime Management of Heterogeneous Compute Resources in Embedded Systems

Author: Luís Miguel Mendes Pimentel Alves de Sousa
Publication venue
Publication date: 15/10/2021
Field of study

Repositório Aberto da Universidade do Porto

A Novel Scalable Decision Tree Implementation on SoC Based FPGAs

Author: Koraei Mostafa
Lefoka Petter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

Machine learning algorithms are rapidly growing in predictive maintenance and condition monitoring systems for valuable assets. Decision tree classification (DTC) is one of popular methods in condition monitoring systems based on vibration analysis. Due to big amount of data coming out from vibration sensors, the processing should be done on edge close to the sensors. DTC can reach high accuracy but at the same time it is computationally intensive and edge processors are not able to run it so fast. There are some FPGA implementation that work fine for small datasets but have issues when there is a real big dataset that needs deep trees. In this paper we introduce our new method of Decision Tree (DT) implementation on SoC based FPGAs. We have shown that using a combination of FPGA and processor, the DT can be implemented much faster and more scalable for trees with depth up to 50. We have used Vivado HLS to implement our DTs and connected them to the processor of SoC via AXI interfaces. We have shown that our implementation gains up to 2.27x speed up comparing with only software implementation.acceptedVersio

SINTEF Open

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

Author: Ben-Nun Tal
Hoefler Torsten
Publication venue
Publication date: 15/09/2018
Field of study

Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this survey, we describe the problem from a theoretical perspective, followed by approaches for its parallelization. We present trends in DNN architectures and the resulting implications on parallelization strategies. We then review and model the different types of concurrency in DNNs: from the single operator, through parallelism in network inference and training, to distributed deep learning. We discuss asynchronous stochastic optimization, distributed system architectures, communication schemes, and neural architecture search. Based on those approaches, we extrapolate potential directions for parallelism in deep learning

arXiv.org e-Print Archive

Repository for Publications and Research Data

SoC-based FPGA architecture for image analysis and other highly demanding applications

Author: MOLINA ROMINA SOLEDAD
Publication venue: Università degli Studi di Trieste
Publication date: 01/03/2023
Field of study

Al giorno d'oggi, lo sviluppo di algoritmi si concentra su calcoli efficienti in termini di prestazioni ed efficienza energetica. Tecnologie come il field programmable gate array (FPGA) e il system on chip (SoC) basato su FPGA (FPGA/SoC) hanno dimostrato la loro capacità di accelerare applicazioni di calcolo intensive risparmiando al contempo il consumo energetico, grazie alla loro capacità di elevato parallelismo e riconfigurazione dell'architettura. Attualmente, i cicli di progettazione esistenti per FPGA/SoC sono lunghi, a causa della complessità dell'architettura. Pertanto, per colmare il divario tra le applicazioni e le architetture FPGA/SoC e ottenere un design hardware efficiente per l'analisi delle immagini e altri applicazioni altamente demandanti utilizzando lo strumento di sintesi di alto livello, vengono prese in considerazione due strategie complementari: tecniche ad hoc e stima delle prestazioni. Per quanto riguarda le tecniche ad-hoc, tre applicazioni molto impegnative sono state accelerate attraverso gli strumenti HLS: discriminatore di forme di impulso per i raggi cosmici, classificazione automatica degli insetti e re-ranking per il recupero delle informazioni, sottolineando i vantaggi quando questo tipo di applicazioni viene attraversato da tecniche di compressione durante il targeting dispositivi FPGA/SoC. Inoltre, in questa tesi viene proposto uno stimatore delle prestazioni per l'accelerazione hardware per prevedere efficacemente l'utilizzo delle risorse e la latenza per FPGA/SoC, costruendo un ponte tra l'applicazione e i domini architetturali. Lo strumento integra modelli analitici per la previsione delle prestazioni e un motore design space explorer (DSE) per fornire approfondimenti di alto livello agli sviluppatori di hardware, composto da due motori indipendenti: DSE basato sull'ottimizzazione a singolo obiettivo e DSE basato sull'ottimizzazione evolutiva multiobiettivo.Nowadays, the development of algorithms focuses on performance-efficient and energy-efficient computations. Technologies such as field programmable gate array (FPGA) and system on chip (SoC) based on FPGA (FPGA/SoC) have shown their ability to accelerate intensive computing applications while saving power consumption, owing to their capability of high parallelism and reconfiguration of the architecture. Currently, the existing design cycles for FPGA/SoC are time-consuming, owing to the complexity of the architecture. Therefore, to address the gap between applications and FPGA/SoC architectures and to obtain an efficient hardware design for image analysis and highly demanding applications using the high-level synthesis tool, two complementary strategies are considered: ad-hoc techniques and performance estimator. Regarding ad-hoc techniques, three highly demanding applications were accelerated through HLS tools: pulse shape discriminator for cosmic rays, automatic pest classification, and re-ranking for information retrieval, emphasizing the benefits when this type of applications are traversed by compression techniques when targeting FPGA/SoC devices. Furthermore, a comprehensive performance estimator for hardware acceleration is proposed in this thesis to effectively predict the resource utilization and latency for FPGA/SoC, building a bridge between the application and architectural domains. The tool integrates analytical models for performance prediction, and a design space explorer (DSE) engine for providing high-level insights to hardware developers, composed of two independent sub-engines: DSE based on single-objective optimization and DSE based on evolutionary multi-objective optimization

Archivio istituzionale della ricerca - Università di Trieste

Data Mining and Machine Learning in Astronomy

Author: Aha D. W.
Aizerman M. A.
Benjamini Y.
Bertin E.
Borne K.
Breiman L.
de Vaucouleurs G.
Dempster A.
Drake A. J.
Ebisuzaki T.
Faundez-Abans M.
Goebel J.
Karhunen K.
Levy S.
Li L.-L.
Maddox S. J.
Molinari E.
Moore G. E.
Naim A.
NICHOLAS M. BALL
P. A.
Patterson F. S.
ROBERT J. BRUNNER
Salzberg S. L.
Scaringi S.
Serra-Ricart M.
Steinhaus H.
Urunkar N.
Wells D. C.
Won E.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 10/08/2010
Field of study

We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

arXiv.org e-Print Archive

Crossref

Fundamentals

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 30/01/2023
Field of study

Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

Directory of Open Access Books (DOAB)

Recommended from our members

A Neural Signal Processor for Low-Latency Spike Inference

Author: Lai Chongxi
Publication venue: University of Cambridge
Publication date: 02/07/2020
Field of study

This thesis describes the development of a system that can assign identities to a population of single-units, in multi-electrode recordings, at single-spike resolution with low-latency. The system has two parts. The first is a Field-Programmable Gate Array (FPGA)-based Neural Signal Processor (NSP) that receives raw input and generates labelled spikes as output, a process referred to as real-time spike inference. The second is a piece of software (Spiketag) that runs on a PC, communicates with the NSP, and generates a spike-sorted model to guide the real-time spike inference. The NSP provides clocks and control signals to five 32-channel INTAN RHD2132 chips to manage the acquisition of 160 channels of raw neural data. In parallel, the NSP further filters, detects and extracts extracellular spike waveforms from the raw neural data recorded by tetrodes or silicon probes and assigns single-unit identity to each detected spike. A set of Python application programming interfaces (APIs) was developed in Spiketag to enable the communication between the NSP and the PC. These APIs allow the NSP to obtain a model from the PC, which holds parameters such as reference channels, spike detection thresholds, spike feature transformation matrix and vector quantized clusters generated by spike sorting a short recording session. Using the spike-sorted model, the NSP performs data acquisition and real-time spike inference simultaneously. Algorithmic modules were implemented in the FPGA and pipelined to compute during 40 ms acquisition intervals. At the output end of the FPGA NSP, the real-time assigned single-unit identity (spike-id) is packaged with the timestamp, the electrode group, and the spike features as a spike-id packet. Spike-id packets are asynchronously transmitted through a low-latency Peripheral Component Interconnect Express (PCIe) interface to the PC, producing the real-time spike trains. The real-time spike trains can be used for further processing, such as real-time decoding. Several types of ground-truth data, including intracellular/extracellular paired recordings, synthesized tetrode extracellular waveforms with ground-truth spike timing and high-channel-count silicon probe recordings with ground-truth animal positions during navigation were used to validate the low-latency (1 ms) and high-accuracy (as high as state-of-the-art offline sorting and decoding algorithms) of the NSP’s real-time spike inference and the NSP-based real-time population decoding performance

Apollo (Cambridge)