4,382 research outputs found
Learning-based Application-Agnostic 3D NoC Design for Heterogeneous Manycore Systems
The rising use of deep learning and other big-data algorithms has led to an
increasing demand for hardware platforms that are computationally powerful, yet
energy-efficient. Due to the amount of data parallelism in these algorithms,
high-performance 3D manycore platforms that incorporate both CPUs and GPUs
present a promising direction. However, as systems use heterogeneity (e.g., a
combination of CPUs, GPUs, and accelerators) to improve performance and
efficiency, it becomes more pertinent to address the distinct and likely
conflicting communication requirements (e.g., CPU memory access latency or GPU
network throughput) that arise from such heterogeneity. Unfortunately, it is
difficult to quickly explore the hardware design space and choose appropriate
tradeoffs between these heterogeneous requirements. To address these
challenges, we propose the design of a 3D Network-on-Chip (NoC) for
heterogeneous manycore platforms that considers the appropriate design
objectives for a 3D heterogeneous system and explores various tradeoffs using
an efficient ML-based multi-objective optimization technique. The proposed
design space exploration considers the various requirements of its
heterogeneous components and generates a set of 3D NoC architectures that
efficiently trades off these design objectives. Our findings show that by
jointly considering these requirements (latency, throughput, temperature, and
energy), we can achieve 9.6% better Energy-Delay Product on average at nearly
iso-temperature conditions when compared to a thermally-optimized design for 3D
heterogeneous NoCs. More importantly, our results suggest that our 3D NoCs
optimized for a few applications can be generalized for unknown applications as
well. Our results show that these generalized 3D NoCs only incur a 1.8%
(36-tile system) and 1.1% (64-tile system) average performance loss compared to
application-specific NoCs.Comment: Published in IEEE Transactions on Computer
Optimizing Routerless Network-on-Chip Designs: An Innovative Learning-Based Framework
Machine learning applied to architecture design presents a promising
opportunity with broad applications. Recent deep reinforcement learning (DRL)
techniques, in particular, enable efficient exploration in vast design spaces
where conventional design strategies may be inadequate. This paper proposes a
novel deep reinforcement framework, taking routerless networks-on-chip (NoC) as
an evaluation case study. The new framework successfully resolves problems with
prior design approaches being either unreliable due to random searches or
inflexible due to severe design space restrictions. The framework learns
(near-)optimal loop placement for routerless NoCs with various design
constraints. A deep neural network is developed using parallel threads that
efficiently explore the immense routerless NoC design space with a Monte Carlo
search tree. Experimental results show that, compared with conventional mesh,
the proposed deep reinforcement learning (DRL) routerless design achieves a
3.25x increase in throughput, 1.6x reduction in packet latency, and 5x
reduction in power. Compared with the state-of-the-art routerless NoC, DRL
achieves a 1.47x increase in throughput, 1.18x reduction in packet latency, and
1.14x reduction in average hop count albeit with slightly more power overhead.Comment: 13 pages, 15 figure
Hardware-Aware Machine Learning: Modeling and Optimization
Recent breakthroughs in Deep Learning (DL) applications have made DL models a
key component in almost every modern computing system. The increased popularity
of DL applications deployed on a wide-spectrum of platforms have resulted in a
plethora of design challenges related to the constraints introduced by the
hardware itself. What is the latency or energy cost for an inference made by a
Deep Neural Network (DNN)? Is it possible to predict this latency or energy
consumption before a model is trained? If yes, how can machine learners take
advantage of these models to design the hardware-optimal DNN for deployment?
From lengthening battery life of mobile devices to reducing the runtime
requirements of DL models executing in the cloud, the answers to these
questions have drawn significant attention.
One cannot optimize what isn't properly modeled. Therefore, it is important
to understand the hardware efficiency of DL models during serving for making an
inference, before even training the model. This key observation has motivated
the use of predictive models to capture the hardware performance or energy
efficiency of DL applications. Furthermore, DL practitioners are challenged
with the task of designing the DNN model, i.e., of tuning the hyper-parameters
of the DNN architecture, while optimizing for both accuracy of the DL model and
its hardware efficiency. Therefore, state-of-the-art methodologies have
proposed hardware-aware hyper-parameter optimization techniques. In this paper,
we provide a comprehensive assessment of state-of-the-art work and selected
results on the hardware-aware modeling and optimization for DL applications. We
also highlight several open questions that are poised to give rise to novel
hardware-aware designs in the next few years, as DL applications continue to
significantly impact associated hardware systems and platforms.Comment: ICCAD'18 Invited Pape
Machine Learning and Manycore Systems Design: A Serendipitous Symbiosis
Tight collaboration between experts of machine learning and manycore system
design is necessary to create a data-driven manycore design framework that
integrates both learning and expert knowledge. Such a framework will be
necessary to address the rising complexity of designing large-scale manycore
systems and machine learning techniques.Comment: To appear in a future publication of IEEE Compute
A survey on scheduling and mapping techniques in 3D Network-on-chip
Network-on-Chips (NoCs) have been widely employed in the design of
multiprocessor system-on-chips (MPSoCs) as a scalable communication solution.
NoCs enable communications between on-chip Intellectual Property (IP) cores and
allow those cores to achieve higher performance by outsourcing their
communication tasks. Mapping and Scheduling methodologies are key elements in
assigning application tasks, allocating the tasks to the IPs, and organising
communication among them to achieve some specified objectives. The goal of this
paper is to present a detailed state-of-the-art of research in the field of
mapping and scheduling of applications on 3D NoC, classifying the works based
on several dimensions and giving some potential research directions
Scalability of broadcast performance in wireless network-on-chip
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version
White Paper on Critical and Massive Machine Type Communication Towards 6G
The society as a whole, and many vertical sectors in particular, is becoming
increasingly digitalized. Machine Type Communication (MTC), encompassing its
massive and critical aspects, and ubiquitous wireless connectivity are among
the main enablers of such digitization at large. The recently introduced 5G New
Radio is natively designed to support both aspects of MTC to promote the
digital transformation of the society. However, it is evident that some of the
more demanding requirements cannot be fully supported by 5G networks.
Alongside, further development of the society towards 2030 will give rise to
new and more stringent requirements on wireless connectivity in general, and
MTC in particular. Driven by the societal trends towards 2030, the next
generation (6G) will be an agile and efficient convergent network serving a set
of diverse service classes and a wide range of key performance indicators
(KPI). This white paper explores the main drivers and requirements of an
MTC-optimized 6G network, and discusses the following six key research
questions:
- Will the main KPIs of 5G continue to be the dominant KPIs in 6G; or will
there emerge new key metrics?
- How to deliver different E2E service mandates with different KPI
requirements considering joint-optimization at the physical up to the
application layer?
- What are the key enablers towards designing ultra-low power receivers and
highly efficient sleep modes?
- How to tackle a disruptive rather than incremental joint design of a
massively scalable waveform and medium access policy for global MTC
connectivity?
- How to support new service classes characterizing mission-critical and
dependable MTC in 6G?
- What are the potential enablers of long term, lightweight and flexible
privacy and security schemes considering MTC device requirements?Comment: White paper by http://www.6GFlagship.co
Toward Creating Subsurface Camera
In this article, the framework and architecture of Subsurface Camera (SAMERA)
is envisioned and described for the first time. A SAMERA is a geophysical
sensor network that senses and processes geophysical sensor signals, and
computes a 3D subsurface image in-situ in real-time. The basic mechanism is:
geophysical waves propagating/reflected/refracted through subsurface enter a
network of geophysical sensors, where a 2D or 3D image is computed and
recorded; a control software may be connected to this network to allow view of
the 2D/3D image and adjustment of settings such as resolution, filter,
regularization and other algorithm parameters. System prototypes based on
seismic imaging have been designed. SAMERA technology is envisioned as a game
changer to transform many subsurface survey and monitoring applications,
including oil/gas exploration and production, subsurface infrastructures and
homeland security, wastewater and CO2 sequestration, earthquake and volcano
hazard monitoring. The system prototypes for seismic imaging have been built.
Creating SAMERA requires an interdisciplinary collaboration and transformation
of sensor networks, signal processing, distributed computing, and geophysical
imaging.Comment: 15 pages, 7 figure
A Survey of Neuromorphic Computing and Neural Networks in Hardware
Neuromorphic computing has come to refer to a variety of brain-inspired
computers, devices, and models that contrast the pervasive von Neumann computer
architecture. This biologically inspired approach has created highly connected
synthetic neurons and synapses that can be used to model neuroscience theories
as well as solve challenging machine learning problems. The promise of the
technology is to create a brain-like ability to learn and adapt, but the
technical challenges are significant, starting with an accurate neuroscience
model of how the brain works, to finding materials and engineering
breakthroughs to build devices to support these models, to creating a
programming framework so the systems can learn, to creating applications with
brain-like capabilities. In this work, we provide a comprehensive survey of the
research and motivations for neuromorphic computing over its history. We begin
with a 35-year review of the motivations and drivers of neuromorphic computing,
then look at the major research areas of the field, which we define as
neuro-inspired models, algorithms and learning approaches, hardware and
devices, supporting systems, and finally applications. We conclude with a broad
discussion on the major research topics that need to be addressed in the coming
years to see the promise of neuromorphic computing fulfilled. The goals of this
work are to provide an exhaustive review of the research conducted in
neuromorphic computing since the inception of the term, and to motivate further
work by illuminating gaps in the field where new research is needed
- …