14 research outputs found
Speck: A Smart event-based Vision Sensor with a low latency 327K Neuron Convolutional Neuronal Network Processing Pipeline
Edge computing solutions that enable the extraction of high level information
from a variety of sensors is in increasingly high demand. This is due to the
increasing number of smart devices that require sensory processing for their
application on the edge. To tackle this problem, we present a smart vision
sensor System on Chip (Soc), featuring an event-based camera and a low power
asynchronous spiking Convolutional Neuronal Network (sCNN) computing
architecture embedded on a single chip. By combining both sensor and processing
on a single die, we can lower unit production costs significantly. Moreover,
the simple end-to-end nature of the SoC facilitates small stand-alone
applications as well as functioning as an edge node in a larger systems. The
event-driven nature of the vision sensor delivers high-speed signals in a
sparse data stream. This is reflected in the processing pipeline, focuses on
optimising highly sparse computation and minimising latency for 9 sCNN layers
to . Overall, this results in an extremely low-latency visual
processing pipeline deployed on a small form factor with a low energy budget
and sensor cost. We present the asynchronous architecture, the individual
blocks, the sCNN processing principle and benchmark against other sCNN capable
processors
Proposta de uma plataforma codeless para implementação de apps de promoção da saúde
O uso de dispositivos móveis por profissionais de saúde transformou diversos aspectos da prática clínica. Os dispositivos móveis tornaram-se comuns em ambientes de saúde. Vá rios aplicativos estão agora disponíveis para auxiliar os profissionais de saúde do inglês, HCPs Health Care Professionals em tarefas importantes. Este estudo parte da hipótese de que o mobile health (mHealth) não é largamente adotado ainda devido à dificuldade tecnológica para o desenvolvimento de aplicativos de saúde com elementos individualizados de informações de engajamento. Esta dificuldade surge do alto custo (financeiro e temporal) de desenvolvimento de tal solução usando as abordagens disponíveis. Embora os benefícios do mHealth já tenham sido demonstrados, seu uso no Brasil ainda é incipiente. Nota-se que as soluções disponíveis oferecem mecanismos genéricos de engajamento e informação, oferecendo um fluxo pré-determinado e único para todos os usuários, embora os fatores de adesão ou não a um tratamento de saúde sejam bastante particulares e individual para cada pessoa. Considerando a complexidade desta proposta, este trabalho apresenta uma prova de conceito da plataforma codeless, fazendo o uso dos conceitos de flow based programing (FBP) e visual based programming (VPL), focando na possibilidade de geração rápida e sem codificação de um aplicativo contendo mecanismos customizados de elementos de engajamento com a possibilidade de customização/individualização de elementos de informação e engajamento para grupos de pacientes. Assim sendo, como resultado desse trabalho foi possível criar um ecossistema de software, onde os profissio nais de saúde conseguem programar um conjunto de elementos de engajamento a fim de aumentar a adesão dos pacientes no tratamento. Por fim, nota-se que ainda existem várias possíveis áreas de evoluções do sistema para contemplar um funcionamento que reflete um fluxo complexo criado pelos HCPs
The Dataflow Computational Model And Its Evolution
Το υπολογιστικό μοντέλο dataflow είναι ένα εναλλακτικό του von-Neumann. Τα κυριότερα χαρακτηριστικά του είναι ο ασύγχρονος προγραμματισμός εργασιών και το ότι επιτρέπει μαζική παραλληλία. Αυτή η πτυχιακή είναι μία μελέτη αυτού του μοντέλου, καθώς και μερικών υβριδικών μοντέλων, που βρίσκονται ανάμεσα στο αρχικό μοντέλο dataflow και στο von-Neumann. Τέλος, υπάρχουν αναφορές σε μερικές αρχές του dataflow, οι οποίες έχουν υιοθετηθεί σε συμβατικές μηχανές, γλώσσες προγραμματισμού και συστήματα κατανεμημένων υπολογισμών.The dataflow computational model is an alternative to the von-Neumann model. Its most
significant aspects are, that it is based on asynchronous instructions scheduling and exposes massive parallelism. This thesis is a review of the dataflow computational model,
as well as of some hybrid models, which lie between the pure dataflow and the von Neumann model. Additionally, there are some references to dataflow principles, that are or are being adopted by conventional machines, programming languages and distributed
computing systems
Recommended from our members
Fine-grain parallelism on sequential processors
There seems to be a consensus that future Massively Parallel Architectures
will consist of a number nodes, or processors, interconnected by high-speed network.
Using a von Neumann style of processing within the node of a multiprocessor system
has its performance limited by the constraints imposed by the control-flow execution
model. Although the conventional control-flow model offers high performance on
sequential execution which exhibits good locality, switching between threads and synchronization
among threads causes substantial overhead. On the other hand, dataflow
architectures support rapid context switching and efficient synchronization but require
extensive hardware and do not use high-speed registers.
There have been a number of architectures proposed to combine the instruction-level
context switching capability with sequential scheduling. One such architecture
is Threaded Abstract Machine (TAM), which supports fine-grain interleaving of multiple
threads by an appropriate compilation strategy rather than through elaborate hardware.
Experiments on TAM have already shown that it is possible to implement the dataflow
execution model on conventional architectures and obtain reasonable performance.
These studies also show a basic mismatch between the requirements for fine-grain
parallelism and the underlying architecture and considerable improvement is possible through hardware support.
This thesis presents two design modifications to efficiently support fine-grain parallelism. First, a modification to the instruction set architecture is proposed to reduce the cost involved in scheduling and synchronization. The hardware modifications are kept to a minimum so as to not disturb the functionality of a conventional RISC processor. Second, a separate coprocessor is utilized to handle messages. Atomicity and message handling are handled efficiently, without compromising per-processor performance and system integrity. Clock cycles per TAM instruction is used as a measure to study the effectiveness of these changes
Recommended from our members
Interface design and system impact analysis of a message-handling processor for fine-grain multithreading
There appears to be a broad agreement that high-performance computers of the future will be
Massively Parallel Architectures (MPAs), where all processors are interconnected by a high-speed
network. One of the major problems with MPAs is the latency observed for remote operations. One
technique to hide this latency is multithreading. In multithreading, whenever an instruction accesses a
remote location, the processor switches to the next available thread waiting for execution. There have
been a number of architectures proposed to implement multithreading. One such architecture is the
Threaded Abstract Machine (TAM). It supports fine-grain multithreading by an appropriate compilation
strategy rather that through elaborate hardware. Experiments on TAM have already shown that fine-grain
multithreading on conventional architectures can achieve reasonable performance.
However, a significant deficiency of the conventional design in the context of fine-grain program
execution is that the message handling is viewed as an appendix rather than as an integral, essential part
of the architecture. Considering that message handling in TAM can constitute as much as one fifth to one
half of total instructions executed, special effort must be given to support it in the underlying hardware.
This thesis presents the design modifications required to efficiently support message handling for
fine-grain parallelism on stock processors. The idea of having a separate processor is proposed and
extended to reduce the overhead due to messages. A detailed hardware is designed to establish the
interface between the conventional processor and the message-handling processor. At the same time, the
necessary cycle cost required to guarantee atomicity between the two processors is minimized. However,
the hardware modifications are kept to a minimum so as not to disturb the original functionality of a
conventional RISC processor. Finally, the effectiveness of the proposed architecture is analyzed in terms
of its impact on the system. The distribution of the workload between both processors is estimated to
indicate the potential speed-up that can be achieved with a separate processor to handle messages