865 research outputs found
Motion correlation based low complexity and low power schemes for video codec
制度:新 ; 報告番号:甲3750号 ; 学位の種類:博士(工学) ; 授与年月日:2012/11/19 ; 早大学位記番号:新6121Waseda Universit
The Berlin Brain-Computer Interface: Progress Beyond Communication and Control
The combined effect of fundamental results about neurocognitive processes and advancements in decoding mental states from ongoing brain signals has brought forth a whole range of potential neurotechnological applications. In this article, we review our developments in this area and put them into perspective. These examples cover a wide range of maturity levels with respect to their applicability. While we assume we are still a long way away from integrating Brain-Computer Interface (BCI) technology in general interaction with computers, or from implementing neurotechnological measures in safety-critical workplaces, results have already now been obtained involving a BCI as research tool. In this article, we discuss the reasons why, in some of the prospective application domains, considerable effort is still required to make the systems ready to deal with the full complexity of the real world.EC/FP7/611570/EU/Symbiotic Mind Computer Interaction for Information Seeking/MindSeeEC/FP7/625991/EU/Hyperscanning 2.0 Analyses of Multimodal Neuroimaging Data: Concept, Methods and Applications/HYPERSCANNING 2.0DFG, 103586207, GRK 1589: Verarbeitung sensorischer Informationen in neuronalen Systeme
EIE: Efficient Inference Engine on Compressed Deep Neural Network
State-of-the-art deep neural networks (DNNs) have hundreds of millions of
connections and are both computationally and memory intensive, making them
difficult to deploy on embedded systems with limited hardware resources and
power budgets. While custom hardware helps the computation, fetching weights
from DRAM is two orders of magnitude more expensive than ALU operations, and
dominates the required power.
Previously proposed 'Deep Compression' makes it possible to fit large DNNs
(AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by
pruning the redundant connections and having multiple connections share the
same weight. We propose an energy efficient inference engine (EIE) that
performs inference on this compressed network model and accelerates the
resulting sparse matrix-vector multiplication with weight sharing. Going from
DRAM to SRAM gives EIE 120x energy saving; Exploiting sparsity saves 10x;
Weight sharing gives 8x; Skipping zero activations from ReLU saves another 3x.
Evaluated on nine DNN benchmarks, EIE is 189x and 13x faster when compared to
CPU and GPU implementations of the same DNN without compression. EIE has a
processing power of 102GOPS/s working directly on a compressed network,
corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of
AlexNet at 1.88x10^4 frames/sec with a power dissipation of only 600mW. It is
24,000x and 3,400x more energy efficient than a CPU and GPU respectively.
Compared with DaDianNao, EIE has 2.9x, 19x and 3x better throughput, energy
efficiency and area efficiency.Comment: External Links: TheNextPlatform: http://goo.gl/f7qX0L ; O'Reilly:
https://goo.gl/Id1HNT ; Hacker News: https://goo.gl/KM72SV ; Embedded-vision:
http://goo.gl/joQNg8 ; Talk at NVIDIA GTC'16: http://goo.gl/6wJYvn ; Talk at
Embedded Vision Summit: https://goo.gl/7abFNe ; Talk at Stanford University:
https://goo.gl/6lwuer. Published as a conference paper in ISCA 201
C-RAN CoMP Methods for MPR Receivers
The growth in mobile network traffic due to the increase in MTC (Machine Type Communication)
applications, brings along a series of new challenges in traffic routing and
management. The goals are to have effective resolution times (less delay), low energy
consuption (given that wide sensor networks which are included in the MTC category, are
built to last years with respect to their battery consuption) and extremely reliable communication
(low Packet Error Rates), following the fifth generation (5G) mobile network
demands.
In order to deal with this type of dense traffic, several uplink strategies can be devised,
where diversity variables like space (several Base Stations deployed), time (number of
retransmissions of a given packet per user) and power spreading (power value diversity
at the receiver, introducing the concept of SIC and Power-NOMA) have to be handled
carefully to fulfill the requirements demanded in Ultra-Reliable Low-Latency Communication
(URLLC).
This thesis, besides being restricted in terms of transmission power and processing of a
User Equipment (UE), works on top of an Iterative Block Decision Feedback Equalization
Reciever that allows Multi Packet Reception to deal with the diversity types mentioned
earlier. The results of this thesis explore the possibility of fragmenting the processing
capabilities in an integrated cloud network (C-RAN) environment through an SINR estimation
at the receiver to better understand how and where we can break and distribute
our processing needs in order to handle near Base Station users and cell-edge users, the
latters being the hardest to deal with in dense networks like the ones deployed in a MTC
environment
CASPR: Judiciously Using the Cloud for Wide-Area Packet Recovery
We revisit a classic networking problem -- how to recover from lost packets
in the best-effort Internet. We propose CASPR, a system that judiciously
leverages the cloud to recover from lost or delayed packets. CASPR supplements
and protects best-effort connections by sending a small number of coded packets
along the highly reliable but expensive cloud paths. When receivers detect
packet loss, they recover packets with the help of the nearby data center, not
the sender, thus providing quick and reliable packet recovery for
latency-sensitive applications. Using a prototype implementation and its
deployment on the public cloud and the PlanetLab testbed, we quantify the
benefits of CASPR in providing fast, cost effective packet recovery. Using
controlled experiments, we also explore how these benefits translate into
improvements up and down the network stack
Taming and Leveraging Directionality and Blockage in Millimeter Wave Communications
To cope with the challenge for high-rate data transmission, Millimeter Wave(mmWave) is one potential solution. The short wavelength unlatched the era of directional mobile communication. The semi-optical communication requires revolutionary thinking. To assist the research and evaluate various algorithms, we build a motion-sensitive mmWave testbed with two degrees of freedom for environmental sensing and general wireless communication.The first part of this thesis contains two approaches to maintain the connection in mmWave mobile communication. The first one seeks to solve the beam tracking problem using motion sensor within the mobile device. A tracking algorithm is given and integrated into the tracking protocol. Detailed experiments and numerical simulations compared several compensation schemes with optical benchmark and demonstrated the efficiency of overhead reduction. The second strategy attempts to mitigate intermittent connections during roaming is multi-connectivity. Taking advantage of properties of rateless erasure code, a fountain code type multi-connectivity mechanism is proposed to increase the link reliability with simplified backhaul mechanism. The simulation demonstrates the efficiency and robustness of our system design with a multi-link channel record.The second topic in this thesis explores various techniques in blockage mitigation. A fast hear-beat like channel with heavy blockage loss is identified in the mmWave Unmanned Aerial Vehicle (UAV) communication experiment due to the propeller blockage. These blockage patterns are detected through Holm\u27s procedure as a problem of multi-time series edge detection. To reduce the blockage effect, an adaptive modulation and coding scheme is designed. The simulation results show that it could greatly improve the throughput given appropriately predicted patterns. The last but not the least, the blockage of directional communication also appears as a blessing because the geometrical information and blockage event of ancillary signal paths can be utilized to predict the blockage timing for the current transmission path. A geometrical model and prediction algorithm are derived to resolve the blockage time and initiate active handovers. An experiment provides solid proof of multi-paths properties and the numeral simulation demonstrates the efficiency of the proposed algorithm
Energy-Efficient Computing for Mobile Signal Processing
Mobile devices have rapidly proliferated, and deployment of handheld devices continues to increase at a spectacular rate. As today's devices not only support advanced signal processing of wireless communication data but also provide rich sets of applications, contemporary mobile computing requires both demanding computation and efficiency. Most mobile processors combine general-purpose processors, digital signal processors, and hardwired application-specific integrated circuits to satisfy their high-performance and low-power requirements. However, such a heterogeneous platform is inefficient in area, power and programmability. Improving the efficiency of programmable mobile systems is a critical challenge and an active area of computer systems research.
SIMD (single instruction multiple data) architectures are very effective for data-level-parallelism intense algorithms in mobile signal processing. However, new characteristics of advanced wireless/multimedia algorithms require architectural re-evaluation to achieve better energy efficiency. Therefore, fourth generation wireless protocol and high definition mobile video algorithms are analyzed to enhance a wide-SIMD architecture. The key enhancements include 1) programmable crossbar to support complex data alignment, 2) SIMD partitioning to support fine-grain SIMD computation, and 3) fused operation to support accelerating frequently used instruction pairs.
Near-threshold computation has been attractive in low-power architecture research because it balances performance and power. To further improve energy efficiency in mobile computing, near-threshold computation is applied to a wide SIMD architecture. This proposed near-threshold wide SIMD architecture-Diet SODA-presents interesting architectural design decisions such as 1) very wide SIMD datapath to compensate for degraded performance induced by near-threshold computation and 2) scatter-gather data prefetcher to exploit large latency gap between memory and the SIMD datapath. Although near-threshold computation provides excellent energy efficiency, it suffers from increased delay variations. A systematic study of delay variations in near-threshold computing is performed and simple techniques-structural duplication and voltage/frequency margining-are explored to tolerate and mitigate the delay variations in near-threshold wide SIMD architectures.
This dissertation analyzes representative wireless/multimedia mobile signal processing algorithms, proposes an energy-efficient programmable platform, and evaluates performance and power. A main theme of this dissertation is that the performance and efficiency of programmable embedded systems can be significantly improved with a combination of parallel SIMD and near-threshold computations.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86356/1/swseo_1.pd
Recommended from our members
A Dynamic Reconfiguration Framework to Maximize Performance/Power in Asymmetric Multicore Processors
Recent trends in technology scaling have shifted the processing paradigm to multicores. Depending on the characteristics of the cores, the multicores can be either symmetric or asymmetric. Prior research has shown that Asymmetric Multicore Processors (AMPs) outperform their symmetric (SMP) counterparts within a given resource and power budget. But, due to the heterogeneity in core-types and time-varying workload behavior, thread-to-core assignment is always a challenge in AMPs. As the computational requirements vary significantly across different applications and with time, there is a need to dynamically allocate appropriate computational resources on demand to suit the applications’ current needs, in order to maximize the performance and minimize the energy consumption. Performance/power of the applications could be further increased by dynamically adapting the voltage and frequency of the cores to better fit the changing characteristics of the workloads. Not only can a core be forced to a low power mode when its activity level is low, but the power saved by doing so could be opportunistically re-budgeted to the other cores to boost the overall system throughput.
To this end, we propose a novel solution that seamlessly combines heterogeneity with a Dynamic Reconfiguration Framework (DRF). The proposed dynamic reconfiguration framework is equipped with Dynamic Resource Allocation (DRA) and Voltage/Frequency Adaptation (DVFA) capabilities to adapt the core resources and operating conditions at runtime to the changing demands of the applications. As a proof of concept, we illustrate our proposed approach using a dual-core AMP and demonstrate significant performance/power benefits over various baselines
Cross-Layer Optimization for Power-Efficient and Robust Digital Circuits and Systems
With the increasing digital services demand, performance and power-efficiency
become vital requirements for digital circuits and systems. However, the
enabling CMOS technology scaling has been facing significant challenges of
device uncertainties, such as process, voltage, and temperature variations. To
ensure system reliability, worst-case corner assumptions are usually made in
each design level. However, the over-pessimistic worst-case margin leads to
unnecessary power waste and performance loss as high as 2.2x. Since
optimizations are traditionally confined to each specific level, those safe
margins can hardly be properly exploited.
To tackle the challenge, it is therefore advised in this Ph.D. thesis to
perform a cross-layer optimization for digital signal processing circuits and
systems, to achieve a global balance of power consumption and output quality.
To conclude, the traditional over-pessimistic worst-case approach leads to
huge power waste. In contrast, the adaptive voltage scaling approach saves
power (25% for the CORDIC application) by providing a just-needed supply
voltage. The power saving is maximized (46% for CORDIC) when a more aggressive
voltage over-scaling scheme is applied. These sparsely occurred circuit errors
produced by aggressive voltage over-scaling are mitigated by higher level error
resilient designs. For functions like FFT and CORDIC, smart error mitigation
schemes were proposed to enhance reliability (soft-errors and timing-errors,
respectively). Applications like Massive MIMO systems are robust against lower
level errors, thanks to the intrinsically redundant antennas. This property
makes it applicable to embrace digital hardware that trades quality for power
savings.Comment: 190 page
- …