301 research outputs found
A Novel Scalable Decision Tree Implementation on SoC Based FPGAs
Machine learning algorithms are rapidly growing in predictive maintenance and condition monitoring systems for valuable assets. Decision tree classification (DTC) is one of popular methods in condition monitoring systems based on vibration analysis. Due to big amount of data coming out from vibration sensors, the processing should be done on edge close to the sensors. DTC can reach high accuracy but at the same time it is computationally intensive and edge processors are not able to run it so fast. There are some FPGA implementation that work fine for small datasets but have issues when there is a real big dataset that needs deep trees. In this paper we introduce our new method of Decision Tree (DT) implementation on SoC based FPGAs. We have shown that using a combination of FPGA and processor, the DT can be implemented much faster and more scalable for trees with depth up to 50. We have used Vivado HLS to implement our DTs and connected them to the processor of SoC via AXI interfaces. We have shown that our implementation gains up to 2.27x speed up comparing with only software implementation.acceptedVersio
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Deep Neural Networks (DNNs) are becoming an important tool in modern
computing applications. Accelerating their training is a major challenge and
techniques range from distributed algorithms to low-level circuit design. In
this survey, we describe the problem from a theoretical perspective, followed
by approaches for its parallelization. We present trends in DNN architectures
and the resulting implications on parallelization strategies. We then review
and model the different types of concurrency in DNNs: from the single operator,
through parallelism in network inference and training, to distributed deep
learning. We discuss asynchronous stochastic optimization, distributed system
architectures, communication schemes, and neural architecture search. Based on
those approaches, we extrapolate potential directions for parallelism in deep
learning
SoC-based FPGA architecture for image analysis and other highly demanding applications
Al giorno d'oggi, lo sviluppo di algoritmi si concentra su calcoli efficienti in termini di prestazioni ed efficienza energetica. Tecnologie come il field programmable gate array (FPGA) e il system on chip (SoC) basato su FPGA (FPGA/SoC) hanno dimostrato la loro capacitĂ di accelerare applicazioni di calcolo intensive risparmiando al contempo il consumo energetico, grazie alla loro capacitĂ di elevato parallelismo e riconfigurazione dell'architettura.
Attualmente, i cicli di progettazione esistenti per FPGA/SoC sono lunghi, a causa della complessitĂ dell'architettura. Pertanto, per colmare il divario tra le applicazioni e le architetture FPGA/SoC e ottenere un design hardware efficiente per l'analisi delle immagini e altri applicazioni altamente demandanti utilizzando lo strumento di sintesi di alto livello, vengono prese in considerazione due strategie complementari: tecniche ad hoc e stima delle prestazioni.
Per quanto riguarda le tecniche ad-hoc, tre applicazioni molto impegnative sono state accelerate attraverso gli strumenti HLS: discriminatore di forme di impulso per i raggi cosmici, classificazione automatica degli insetti e re-ranking per il recupero delle informazioni, sottolineando i vantaggi quando questo tipo di applicazioni viene attraversato da tecniche di compressione durante il targeting dispositivi FPGA/SoC.
Inoltre, in questa tesi viene proposto uno stimatore delle prestazioni per l'accelerazione hardware per prevedere efficacemente l'utilizzo delle risorse e la latenza per FPGA/SoC, costruendo un ponte tra l'applicazione e i domini architetturali. Lo strumento integra modelli analitici per la previsione delle prestazioni e un motore design space explorer (DSE) per fornire approfondimenti di alto livello agli sviluppatori di hardware, composto da due motori indipendenti: DSE basato sull'ottimizzazione a singolo obiettivo e DSE basato sull'ottimizzazione evolutiva multiobiettivo.Nowadays, the development of algorithms focuses on performance-efficient and energy-efficient computations. Technologies such as field programmable gate array (FPGA) and system on chip (SoC) based on FPGA (FPGA/SoC) have shown their ability to accelerate intensive computing applications while saving power consumption, owing to their capability of high parallelism and reconfiguration of the architecture.
Currently, the existing design cycles for FPGA/SoC are time-consuming, owing to the complexity of the architecture. Therefore, to address the gap between applications and FPGA/SoC architectures and to obtain an efficient hardware design for image analysis and highly demanding applications using the high-level synthesis tool, two complementary strategies are considered: ad-hoc techniques and performance estimator.
Regarding ad-hoc techniques, three highly demanding applications were accelerated through HLS tools: pulse shape discriminator for cosmic rays, automatic pest classification, and re-ranking for information retrieval, emphasizing the benefits when this type of applications are traversed by compression techniques when targeting FPGA/SoC devices.
Furthermore, a comprehensive performance estimator for hardware acceleration is proposed in this thesis to effectively predict the resource utilization and latency for FPGA/SoC, building a bridge between the application and architectural domains. The tool integrates analytical models for performance prediction, and a design space explorer (DSE) engine for providing high-level insights to hardware developers, composed of two independent sub-engines: DSE based on single-objective optimization and DSE based on evolutionary multi-objective optimization
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
Recommended from our members
A Neural Signal Processor for Low-Latency Spike Inference
This thesis describes the development of a system that can assign identities to a population of single-units, in multi-electrode recordings, at single-spike resolution with low-latency. The system has two parts. The first is a Field-Programmable Gate Array (FPGA)-based Neural Signal Processor (NSP) that receives raw input and generates labelled spikes as output, a process referred to as real-time spike inference. The second is a piece of software (Spiketag) that runs on a PC, communicates with the NSP, and generates a spike-sorted model to guide the real-time spike inference. The NSP provides clocks and control signals to five 32-channel INTAN RHD2132 chips to manage the acquisition of 160 channels of raw neural data. In parallel, the NSP further filters, detects and extracts extracellular spike waveforms from the raw neural data recorded by tetrodes or silicon probes and assigns single-unit identity to each detected spike. A set of Python application programming interfaces (APIs) was developed in Spiketag to enable the communication between the NSP and the PC. These APIs allow the NSP to obtain a model from the PC, which holds parameters such as reference channels, spike detection thresholds, spike feature transformation matrix and vector quantized clusters generated by spike sorting a short recording session. Using the spike-sorted model, the NSP performs data acquisition and real-time spike inference simultaneously. Algorithmic modules were implemented in the FPGA and pipelined to compute during 40 ms acquisition intervals. At the output end of the FPGA NSP, the real-time assigned single-unit identity (spike-id) is packaged with the timestamp, the electrode group, and the spike features as a spike-id packet. Spike-id packets are asynchronously transmitted through a low-latency Peripheral Component Interconnect Express (PCIe) interface to the PC, producing the real-time spike trains. The real-time spike trains can be used for further processing, such as real-time decoding. Several types of ground-truth data, including intracellular/extracellular paired recordings, synthesized
tetrode extracellular waveforms with ground-truth spike timing and high-channel-count silicon probe recordings with ground-truth animal positions during navigation were used to validate the low-latency (1 ms) and high-accuracy (as high as state-of-the-art offline sorting and decoding algorithms) of the NSP’s real-time spike inference and the NSP-based
real-time population decoding performance
- …