93,442 research outputs found
A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets
Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NTX that can be used to train state-of-the-art deep convolutional neural networks at scale. Our main contributions are: (i) a loose coupling of RISC-V cores and NTX co-processors reducing offloading overhead by 7 x over previously published results; (ii) an optimized IEEE 754 compliant data path for fast high-precision convolutions and gradient propagation; (iii) evaluation of near-memory computing with NTX embedded into residual area on the Logic Base die of a Hybrid Memory Cube; and (iv) a scaling analysis to meshes of HMCs in a data center scenario. We demonstrate a 2.7 x energy efficiency improvement of NTX over contemporary GPUs at 4.4 x less silicon area, and a compute performance of 1.2 Tflop/s for training large state-of-the-art networks with full floating-point precision. At the data center scale, a mesh of NTX achieves above 95 percent parallel and energy efficiency, while providing 2.1 x energy savings or 3.1 x performance improvement over a GPU-based system
Active Topology Inference using Network Coding
Our goal is to infer the topology of a network when (i) we can send probes
between sources and receivers at the edge of the network and (ii) intermediate
nodes can perform simple network coding operations, i.e., additions. Our key
intuition is that network coding introduces topology-dependent correlation in
the observations at the receivers, which can be exploited to infer the
topology. For undirected tree topologies, we design hierarchical clustering
algorithms, building on our prior work. For directed acyclic graphs (DAGs),
first we decompose the topology into a number of two-source, two-receiver
(2-by-2) subnetwork components and then we merge these components to
reconstruct the topology. Our approach for DAGs builds on prior work on
tomography, and improves upon it by employing network coding to accurately
distinguish among all different 2-by-2 components. We evaluate our algorithms
through simulation of a number of realistic topologies and compare them to
active tomographic techniques without network coding. We also make connections
between our approach and alternatives, including passive inference, traceroute,
and packet marking
Fuzzy based load and energy aware multipath routing for mobile ad hoc networks
Routing is a challenging task in Mobile Ad hoc Networks (MANET) due to their dynamic topology and lack of central administration. As a consequence of un-predictable topology changes of such networks, routing protocols employed need to accurately capture the delay, load, available bandwidth and residual node energy at various locations of the network for effective energy and load balancing. This paper presents a fuzzy logic based scheme that ensures delay, load and energy aware routing to avoid congestion and minimise end-to-end delay in MANETs. In the proposed approach, forwarding delay, average load, available bandwidth and residual battery energy at a mobile node are given as inputs to a fuzzy inference engine to determine the traffic distribution possibility from that node based on the given fuzzy rules. Based on the output from the fuzzy system, traffic is distributed over fail-safe multiple routes to reduce the load at a congested node. Through simulation results, we show that our approach reduces end-to-end delay, packet drop and average energy consumption and increases packet delivery ratio for constant bit rate (CBR) traffic when compared with the popular Ad hoc On-demand Multipath Distance Vector (AOMDV) routing protocol
On the Properties of Gromov Matrices and their Applications in Network Inference
The spanning tree heuristic is a commonly adopted procedure in network
inference and estimation. It allows one to generalize an inference method
developed for trees, which is usually based on a statistically rigorous
approach, to a heuristic procedure for general graphs by (usually randomly)
choosing a spanning tree in the graph to apply the approach developed for
trees. However, there are an intractable number of spanning trees in a dense
graph. In this paper, we represent a weighted tree with a matrix, which we call
a Gromov matrix. We propose a method that constructs a family of Gromov
matrices using convex combinations, which can be used for inference and
estimation instead of a randomly selected spanning tree. This procedure
increases the size of the candidate set and hence enhances the performance of
the classical spanning tree heuristic. On the other hand, our new scheme is
based on simple algebraic constructions using matrices, and hence is still
computationally tractable. We discuss some applications on network inference
and estimation to demonstrate the usefulness of the proposed method
FastVentricle: Cardiac Segmentation with ENet
Cardiac Magnetic Resonance (CMR) imaging is commonly used to assess cardiac
structure and function. One disadvantage of CMR is that post-processing of
exams is tedious. Without automation, precise assessment of cardiac function
via CMR typically requires an annotator to spend tens of minutes per case
manually contouring ventricular structures. Automatic contouring can lower the
required time per patient by generating contour suggestions that can be lightly
modified by the annotator. Fully convolutional networks (FCNs), a variant of
convolutional neural networks, have been used to rapidly advance the
state-of-the-art in automated segmentation, which makes FCNs a natural choice
for ventricular segmentation. However, FCNs are limited by their computational
cost, which increases the monetary cost and degrades the user experience of
production systems. To combat this shortcoming, we have developed the
FastVentricle architecture, an FCN architecture for ventricular segmentation
based on the recently developed ENet architecture. FastVentricle is 4x faster
and runs with 6x less memory than the previous state-of-the-art ventricular
segmentation architecture while still maintaining excellent clinical accuracy.Comment: 11 pages, 6 figures, Accepted to Functional Imaging and Modeling of
the Heart (FIMH) 201
- …