69 research outputs found

    An ultra-low voltage FFT processor using energy-aware techniques

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2004.Page 170 blank.Includes bibliographical references (p. 165-169).In a number of emerging applications such as wireless sensor networks, system lifetime depends on the energy efficiency of computation and communication. The key metric in such applications is the energy dissipated per function rather than traditional ones such as clock speed or silicon area. Hardware designs are shifting focus toward enabling energy-awareness, allowing the processor to be energy-efficient for a variety of operating scenarios. This is in contrast to conventional low-power design, which optimizes for the worst-case scenario. Here, three energy-quality scalable hooks are designed into a real-valued FFT processor: variable FFT length (N=128 to 1024 points), variable bit precision (8,16 bit), and variable voltage supply with variable clock frequency (VDD=1 80mV to 0.9V, and f=164Hz to 6MHz). A variable-bit-precision and variable-FFT-length scalable FFT ASIC using an off-the-shelf standard-cell logic library and memory only scales down to 1V operation. Further energy savings is achieved through ultra-low voltage-supply operation. As performance requirements are relaxed, the operating voltage supply is scaled down, possibly even below the threshold voltage into the subthreshold region. When lower frequencies cause leakage energy dissipation to exceed the active energy dissipation, there is an optimal operating point for minimizing energy consumption.(cont.) Logic and memory design techniques allowing ultra-low voltage operation are employed to study the optimal frequency/voltage operating point for the FFT. A full-custom implementation with circuit techniques optimized for deep voltage scaling into the subthreshold regime, is fabricated using a standard CMOS 0.18[mu]m logic process and functions down to 180mV. At the optimal operating point where the voltage supply is 350mV, the FFT processor dissipates 155nJ/FFT. The custom FFT is 8x more energy-efficient than the ASIC implementation and 350x more energy-efficient than a low-power microprocessor implementation.by Alice Wang.Ph.D

    Cloud Radio Access Network architecture. Towards 5G mobile networks

    Get PDF

    Towards Scalable Design of Future Wireless Networks

    Full text link
    Wireless operators face an ever-growing challenge to meet the throughput and processing requirements of billions of devices that are getting connected. In current wireless networks, such as LTE and WiFi, these requirements are addressed by provisioning more resources: spectrum, transmitters, and baseband processors. However, this simple add-on approach to scale system performance is expensive and often results in resource underutilization. What are, then, the ways to efficiently scale the throughput and operational efficiency of these wireless networks? To answer this question, this thesis explores several potential designs: utilizing unlicensed spectrum to augment the bandwidth of a licensed network; coordinating transmitters to increase system throughput; and finally, centralizing wireless processing to reduce computing costs. First, we propose a solution that allows LTE, a licensed wireless standard, to co-exist with WiFi in the unlicensed spectrum. The proposed solution bridges the incompatibility between the fixed access of LTE, and the random access of WiFi, through channel reservation. It achieves a fair LTE-WiFi co-existence despite the transmission gaps and unequal frame durations. Second, we consider a system where different MIMO transmitters coordinate to transmit data of multiple users. We present an adaptive design of the channel feedback protocol that mitigates interference resulting from the imperfect channel information. Finally, we consider a Cloud-RAN architecture where a datacenter or a cloud resource processes wireless frames. We introduce a tree-based design for real-time transport of baseband samples and provide its end-to-end schedulability and capacity analysis. We also present a processing framework that combines real-time scheduling with fine-grained parallelism. The framework reduces processing times by migrating parallelizable tasks to idle compute resources, and thus, decreases the processing deadline-misses at no additional cost. We implement and evaluate the above solutions using software-radio platforms and off-the-shelf radios, and confirm their applicability in real-world settings.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133358/1/gkchai_1.pd

    Processing-in-Memory in the Exascale Computing and Big Data era: a Structured Approach

    Get PDF
    The aim of this Thesis was to study Processing-in-Memory (PIM) architectures, in current Exascale Computing and Big Data era, exploiting a structured approach and a methodology able to capture all the useful details to carry out a performance, as well as energy, characterization of structured parallel programs targeting PIM systems

    Modular high maneuverability autonomous underwater vehicle

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2009.Includes bibliographical references (p. 111-115).The design and construction of a modular test bed autonomous underwater vehicle (AUV) is analyzed. Although a relatively common stacked-hull design is used, the state of the art is advanced through an aggressive power plant, with capability to support azimuthing thrusters and a 2DOF front sensor assembly. Through an application of lean principles to developmental hardware, the notion of a delayed differentiation is isolated as a key to minimizing rework and creating essentially transparent electronic hardware. Additionally, the use of bus-modular structural and electronic interconnects facilitates reconfiguration of the vehicle across a large range of components, allowing the rapid development of new sensors, control algorithms, and mechanical hardware. Initial wet tests confirm basic data acquisition capabilities and allowed sensor fusion of scanning sonar returns at the beam level with data from an IMU for an orientation-corrected sonar mosaic.by Daniel G. Walker.S.M

    Low power data-dependent transform video and still image coding

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 139-144).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.This work introduces the idea of data dependent video coding for low power. Algorithms for the Discrete Cosine Transform (DCT) and its inverse are introduced which exploit statistical properties of the input data in both the space and spatial frequency domains in order to minimize the total number of arithmetic operations. Two VLSI chips have been built as a proof-of-concept of data dependent processing implementing the DCT and its inverse (IDCT). The IDCT core processor exploits the presence of a large number of zerovalued spectral coefficients in the input stream when stimulated with MPEG-compressed video sequences. Adata-driven IDCT computation algorithm along with clock gating techniques are used to reduce the number of arithmetic operations for video inputs. The second chip is a DCT core processor that exhibits two innovative techniques for arithmetic operation reduction in the DCT computation context along with standard voltage scaling techniques such as pipelining and parallelism. The first method reduces the bitwidth of arithmetic operations in the presence of data spatial correlation. The second method trades off power dissipation and image compression quality (arithmetic precision.) Both chips are fully functional and exhibit the lowest switched capacitance per sample among past DCT/IDCT chips reported in the literature. Their power dissipation profile shows a strong dependence with certain statistical properties of the data that they operate on, according to the design goal.by Thucydides Xanthopoulos.Ph.D

    A Heterogeneous System Architecture for Low-Power Wireless Sensor Nodes in Compute-Intensive Distributed Applications

    Get PDF
    Wireless Sensor Networks (WSNs) combine embedded sensing and processing capabilities with a wireless communication infrastructure, thus supporting distributed monitoring applications. WSNs have been investigated for more than three decades, and recent social and industrial developments such as home automation, or the Internet of Things, have increased the commercial relevance of this key technology. The communication bandwidth of the sensor nodes is limited by the transportation media and the restricted energy budget of the nodes. To still keep up with the ever increasing sensor count and sampling rates, the basic data acquisition and collection capabilities of WSNs have been extended with decentralized smart feature extraction and data aggregation algorithms. Energy-efficient processing elements are thus required to meet the ever-growing compute demands of the WSN motes within the available energy budget. The Hardware-Accelerated Low Power Mote (HaLoMote) is proposed and evaluated in this thesis to address the requirements of compute-intensive WSN applications. It is a heterogeneous system architecture, that combines a Field Programmable Gate Array (FPGA) for hardware-accelerated data aggregation with an IEEE 802.15.4 based Radio Frequency System-on-Chip for the network management and the top-level control of the applications. To properly support Dynamic Power Management (DPM) on the HaLoMote, a Microsemi IGLOO FPGA with a non-volatile configuration storage was chosen for a prototype implementation, called Hardware-Accelerated Low Energy Wireless Embedded Sensor Node (HaLOEWEn). As for every multi-processor architecture, the inter-processor communication and coordination strongly influences the efficiency of the HaLoMote. Therefore, a generic communication framework is proposed in this thesis. It is tightly coupled with the DPM strategy of the HaLoMote, that supports fast transitions between active and idle modes. Low-power sleep periods can thus be scheduled within every sampling cycle, even for sampling rates of hundreds of hertz. In addition to the development of the heterogeneous system architecture, this thesis focuses on the energy consumption trade-off between wireless data transmission and in-sensor data aggregation. The HaLOEWEn is compared with typical software processors in terms of runtime and energy efficiency in the context of three monitoring applications. The building blocks of these applications comprise hardware-accelerated digital signal processing primitives, lossless data compression, a precise wireless time synchronization protocol, and a transceiver scheduling for contention free information flooding from multiple sources to all network nodes. Most of these concepts are applicable to similar distributed monitoring applications with in-sensor data aggregation. A Structural Health Monitoring (SHM) application is used for the system level evaluation of the HaLoMote concept. The Random Decrement Technique (RDT) is a particular SHM data aggregation algorithm, which determines the free-decay response of the monitored structure for subsequent modal identification. The hardware-accelerated RDT executed on a HaLOEWEn mote requires only 43 % of the energy that a recent ARM Cortex-M based microcontroller consumes for this algorithm. The functionality of the overall WSN-based SHM system is shown with a laboratory-scale demonstrator. Compared to reference data acquired by a wire-bound laboratory measurement system, the HaLOEWEn network can capture the structural information relevant for the SHM application with less than 1 % deviation

    Studies for the Commissioning of the CERN CMS Silicon Strip Tracker

    Get PDF
    In 2008 the Large Hadron Collider (LHC) at CERN will start producing proton-proton collisions of unprecedented energy. One of its main experiments is the Compact Muon Solenoid (CMS), a general purpose detector, optimized for the search of the Higgs boson and super symmetric particles. The discovery potential of the CMS detector relies on a high precision tracking system, made of a pixel detector and the largest silicon strip Tracker ever built. In order to operate successfully a device as complex as the CMS silicon strip Tracker, and to fully exploit its potential, the properties of the hardware need to be characterized as precisely as possible, and the reconstruction software needs to be commissioned with physics signals. A number of issues were identified and studied to commission the detector, some of which concern the entire Tracker, while some are specific to the Tracker Outer Barrel (TOB): - the time evolution of the signals in the readout electronics need to be precisely measured and correctly simulated, as it affects the expected occupancy and the data volume, critical issues in high-luminosity running; - the electronics coupling between neighbouring channels affects the cluster size and hence the hit resolution, the tracking precision, the occupancy and the data volume; - the mechanical structure of the Rods (the sub-assemblies of the TOB) is mostly made of carbon fiber elements; aluminum inserts glued to the carbon fi ber frame provide efficient cooling contacts between the silicon detectors and the thin cooling pipe, made of a copper-nickel alloy; the different thermal expansion coefficients of the various components induce stresses on the structure when this is cooled down to the operating temperature, possibly causing small deformations; a detailed characterization of the geometrical precision of the rods and of its possible evolution with temperature is a valuable input for track reconstruction in CMS. These and other issues were studied in this thesis. For this purpose, a large scale test setup, designed to study the detector performance by tracking cosmic muons, was operated over several months. A dedicated trigger system was set up, to select tracks synchronous with the fast readout electronics, and to be able to perform a precise measurement of the time evolution of the front-end signals. Data collected at room temperature and at the Tracker operating temperature of -10°C were used to test reconstruction and alignment algorithms for the Tracker, as well as to perform a detailed qualification of the geometry and the functionality of the structures at different temperatures
    • …
    corecore