1,302 research outputs found
Intelligent Embedded Software: New Perspectives and Challenges
Intelligent embedded systems (IES) represent a novel and promising generation of embedded systems (ES). IES have the capacity of reasoning about their external environments and adapt their behavior accordingly. Such systems are situated in the intersection of two different branches that are the embedded computing and the intelligent computing. On the other hand, intelligent embedded software (IESo) is becoming a large part of the engineering cost of intelligent embedded systems. IESo can include some artificial intelligence (AI)-based systems such as expert systems, neural networks and other sophisticated artificial intelligence (AI) models to guarantee some important characteristics such as self-learning, self-optimizing and self-repairing. Despite the widespread of such systems, some design challenging issues are arising. Designing a resource-constrained software and at the same time intelligent is not a trivial task especially in a real-time context. To deal with this dilemma, embedded system researchers have profited from the progress in semiconductor technology to develop specific hardware to support well AI models and render the integration of AI with the embedded world a reality
hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices
Accessible machine learning algorithms, software, and diagnostic tools for
energy-efficient devices and systems are extremely valuable across a broad
range of application domains. In scientific domains, real-time near-sensor
processing can drastically improve experimental design and accelerate
scientific discoveries. To support domain scientists, we have developed hls4ml,
an open-source software-hardware codesign workflow to interpret and translate
machine learning algorithms for implementation with both FPGA and ASIC
technologies. We expand on previous hls4ml work by extending capabilities and
techniques towards low-power implementations and increased usability: new
Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long
pipeline kernels for low power, and new device backends include an ASIC
workflow. Taken together, these and continued efforts in hls4ml will arm a new
generation of domain scientists with accessible, efficient, and powerful tools
for machine-learning-accelerated discovery.Comment: 10 pages, 8 figures, TinyML Research Symposium 202
Middleware platform for distributed applications incorporating robots, sensors and the cloud
Cyber-physical systems in the factory of the future
will consist of cloud-hosted software governing an agile
production process executed by autonomous mobile robots
and controlled by analyzing the data from a vast number of
sensors. CPSs thus operate on a distributed production floor
infrastructure and the set-up continuously changes with each
new manufacturing task. In this paper, we present our OSGibased
middleware that abstracts the deployment of servicebased
CPS software components on the underlying distributed
platform comprising robots, actuators, sensors and the cloud.
Moreover, our middleware provides specific support to develop
components based on artificial neural networks, a technique that
recently became very popular for sensor data analytics and robot
actuation. We demonstrate a system where a robot takes actions
based on the input from sensors in its vicinity
Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs
Deep learning has significantly advanced the state of the
art in artificial intelligence, gaining wide popularity from both industry
and academia. Special interest is around Convolutional Neural Networks
(CNN), which take inspiration from the hierarchical structure
of the visual cortex, to form deep layers of convolutional operations,
along with fully connected classifiers. Hardware implementations of these
deep CNN architectures are challenged with memory bottlenecks that
require many convolution and fully-connected layers demanding large
amount of communication for parallel computation. Multi-core CPU
based solutions have demonstrated their inadequacy for this problem
due to the memory wall and low parallelism. Many-core GPU architectures
show superior performance but they consume high power and also
have memory constraints due to inconsistencies between cache and main
memory. OpenCL is commonly used to describe these architectures for
their execution on GPGPUs or FPGAs. FPGA design solutions are also
actively being explored, which allow implementing the memory hierarchy
using embedded parallel BlockRAMs. This boosts the parallel use
of shared memory elements between multiple processing units, avoiding
data replicability and inconsistencies. This makes FPGAs potentially
powerful solutions for real-time classification of CNNs. In this
paper both Altera and Xilinx adopted OpenCL co-design frameworks
for pseudo-automatic development solutions are evaluated. A comprehensive
evaluation and comparison for a 5-layer deep CNN is presented.
Hardware resources, temporal performance and the OpenCL architecture
for CNNs are discussed. Xilinx demonstrates faster synthesis, better
FPGA resource utilization and more compact boards. Altera provides
multi-platforms tools, mature design community and better execution
times.Ministerio de Economía y Competitividad TEC2016-77785-
- …