12,909 research outputs found
Dynamic Security-aware Routing for Zone-based data Protection in Multi-Processor System-on-Chips
In this work, we propose a NoC which enforces the
encapsulation of sensitive traffic inside the asymmetrical security
zones while using minimal and non-minimal paths. The NoC
routes guarantee that the sensitive traffic is communicated only
through the trusted nodes which belong to the security zone.
As the shape of the zones may change during operation, the
sensitive traffic must be routed through low-risk paths. We test
our proposal and we show that our solution can be an efficient
and scalable alternative for enforce the data protection inside the
MPSoC
Scalable and Sustainable Deep Learning via Randomized Hashing
Current deep learning architectures are growing larger in order to learn from
complex datasets. These architectures require giant matrix multiplication
operations to train millions of parameters. Conversely, there is another
growing trend to bring deep learning to low-power, embedded devices. The matrix
operations, associated with both training and testing of deep networks, are
very expensive from a computational and energy standpoint. We present a novel
hashing based technique to drastically reduce the amount of computation needed
to train and test deep networks. Our approach combines recent ideas from
adaptive dropouts and randomized hashing for maximum inner product search to
select the nodes with the highest activation efficiently. Our new algorithm for
deep learning reduces the overall computational cost of forward and
back-propagation by operating on significantly fewer (sparse) nodes. As a
consequence, our algorithm uses only 5% of the total multiplications, while
keeping on average within 1% of the accuracy of the original model. A unique
property of the proposed hashing based back-propagation is that the updates are
always sparse. Due to the sparse gradient updates, our algorithm is ideally
suited for asynchronous and parallel training leading to near linear speedup
with increasing number of cores. We demonstrate the scalability and
sustainability (energy efficiency) of our proposed algorithm via rigorous
experimental evaluations on several real datasets
Distributed Reconstruction of Nonlinear Networks: An ADMM Approach
In this paper, we present a distributed algorithm for the reconstruction of
large-scale nonlinear networks. In particular, we focus on the identification
from time-series data of the nonlinear functional forms and associated
parameters of large-scale nonlinear networks. Recently, a nonlinear network
reconstruction problem was formulated as a nonconvex optimisation problem based
on the combination of a marginal likelihood maximisation procedure with
sparsity inducing priors. Using a convex-concave procedure (CCCP), an iterative
reweighted lasso algorithm was derived to solve the initial nonconvex
optimisation problem. By exploiting the structure of the objective function of
this reweighted lasso algorithm, a distributed algorithm can be designed. To
this end, we apply the alternating direction method of multipliers (ADMM) to
decompose the original problem into several subproblems. To illustrate the
effectiveness of the proposed methods, we use our approach to identify a
network of interconnected Kuramoto oscillators with different network sizes
(500~100,000 nodes).Comment: To appear in the Preprints of 19th IFAC World Congress 201
NoCo: ILP-based worst-case contention estimation for mesh real-time manycores
Manycores are capable of providing the computational demands required by functionally-advanced critical applications in domains such as automotive and avionics. In manycores a network-on-chip (NoC) provides access to shared caches and memories and hence concentrates most of the contention that tasks suffer, with effects on the worst-case contention delay (WCD) of packets and tasks' WCET. While several proposals minimize the impact of individual NoC parameters on WCD, e.g. mapping and routing, there are strong dependences among these NoC parameters. Hence, finding the optimal NoC configurations requires optimizing all parameters simultaneously, which represents a multidimensional optimization problem. In this paper we propose NoCo, a novel approach that combines ILP and stochastic optimization to find NoC configurations in terms of packet routing, application mapping, and arbitration weight allocation. Our results show that NoCo improves other techniques that optimize a subset of NoC parameters.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015-
65316-P and the HiPEAC Network of Excellence. It also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (agreement No. 772773). Carles Hernández
is jointly supported by the MINECO and FEDER funds
through grant TIN2014-60404-JIN. Jaume Abella has been
partially supported by the Spanish Ministry of Economy and
Competitiveness under Ramon y Cajal postdoctoral fellowship
number RYC-2013-14717. Enrico Mezzetti has been partially
supported by the Spanish Ministry of Economy and Competitiveness
under Juan de la Cierva-Incorporaci´on postdoctoral
fellowship number IJCI-2016-27396.Peer ReviewedPostprint (author's final draft
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
- …