3,743 research outputs found

    Energy-efficient acceleration of MPEG-4 compression tools

    Get PDF
    We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09 μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art

    Investigation of performance issues affecting optical circuit and packet switched WDM networks

    Get PDF
    Optical switching represents the next step in the evolution of optical networks. This thesis describes work that was carried out to examine performance issues which can occur in two distinct varieties of optical switching networks. Slow optical switching in which lightpaths are requested, provisioned and torn down when no longer required is known as optical circuit switching (OCS). Services enabled by OCS include wavelength routing, dynamic bandwidth allocation and protection switching. With network elements such as reconfigurable optical add/drop multiplexers (ROADMs) and optical cross connects (OXCs) now being deployed along with the generalized multiprotocol label switching (GMPLS) control plane this represents the current state of the art in commercial networks. These networks often employ erbium doped fiber amplifiers (EDFAs) to boost the optical signal to noise ratio of the WDM channels and as channel configurations change, wavelength dependent gain variations in the EDFAs can lead to channel power divergence that can result in significant performance degradation. This issue is examined in detail using a reconfigurable wavelength division multiplexed (WDM) network testbed and results show the severe impact that channel reconfiguration can have on transmission performance. Following the slow switching work the focus shifts to one of the key enabling technologies for fast optical switching, namely the tunable laser. Tunable lasers which can switch on the nanosecond timescale will be required in the transmitters and wavelength converters of optical packet switching networks. The switching times and frequency drifts, both of commercially available lasers, and of novel devices are investigated and performance issues which can arise due to this frequency drift are examined. An optical packet switching transmitter based on a novel label switching technique and employing one of the fast tunable lasers is designed and employed in a dual channel WDM packet switching system. In depth performance evaluations of this labelling scheme and packet switching system show the detrimental impact that wavelength drift can have on such systems

    LPSR: Novel Low Power State Retention Technique for CMOS VLSI Design

    Get PDF
    In mobile computing and mobile communication applications powered by battery, the battery life is a premier concern. Leakage power loss is critical in CMOS VLSI circuits as it leaks the battery even when devices are in idle state. To reduce subthreshold leakage power as well as total power in CMOS logic gates and circuits a new circuit technique called LPSR Technique is proposed in this work. Earlier well known techniques for leakage reduction and state retention are compared with this technique. This technique reduces maximum amount of leakage power during deep sleep mode, maximum power reduction during dynamic (clocked) mode and has a provision of preserving state in low power sleep mode. All the circuits are designed, simulated and low power performance evaluation is done using 90nm CMOS technology files in Cadence Design Environmen

    Charge recycling in MTCMOS circuits: concept and analysis

    Get PDF

    Cardiac cell modelling: Observations from the heart of the cardiac physiome project

    Get PDF
    In this manuscript we review the state of cardiac cell modelling in the context of international initiatives such as the IUPS Physiome and Virtual Physiological Human Projects, which aim to integrate computational models across scales and physics. In particular we focus on the relationship between experimental data and model parameterisation across a range of model types and cellular physiological systems. Finally, in the context of parameter identification and model reuse within the Cardiac Physiome, we suggest some future priority areas for this field

    An efficient design space exploration framework to optimize power-efficient heterogeneous many-core multi-threading embedded processor architectures

    Get PDF
    By the middle of this decade, uniprocessor architecture performance had hit a roadblock due to a combination of factors, such as excessive power dissipation due to high operating frequencies, growing memory access latencies, diminishing returns on deeper instruction pipelines, and a saturation of available instruction level parallelism in applications. An attractive and viable alternative embraced by all the processor vendors was multi-core architectures where throughput is improved by using micro-architectural features such as multiple processor cores, interconnects and low latency shared caches integrated on a single chip. The individual cores are often simpler than uniprocessor counterparts, use hardware multi-threading to exploit thread-level parallelism and latency hiding and typically achieve better performance-power figures. The overwhelming success of the multi-core microprocessors in both high performance and embedded computing platforms motivated chip architects to dramatically scale the multi-core processors to many-cores which will include hundreds of cores on-chip to further improve throughput. With such complex large scale architectures however, several key design issues need to be addressed. First, a wide range of micro- architectural parameters such as L1 caches, load/store queues, shared cache structures and interconnection topologies and non-linear interactions between them define a vast non-linear multi-variate micro-architectural design space of many-core processors; the traditional method of using extensive in-loop simulation to explore the design space is simply not practical. Second, to accurately evaluate the performance (measured in terms of cycles per instruction (CPI)) of a candidate design, the contention at the shared cache must be accounted in addition to cycle-by-cycle behavior of the large number of cores which superlinearly increases the number of simulation cycles per iteration of the design exploration. Third, single thread performance does not scale linearly with number of hardware threads per core and number of cores due to memory wall effect. This means that at every step of the design process designers must ensure that single thread performance is not unacceptably slowed down while increasing overall throughput. While all these factors affect design decisions in both high performance and embedded many-core processors, the design of embedded processors required for complex embedded applications such as networking, smart power grids, battlefield decision-making, consumer electronics and biomedical devices to name a few, is fundamentally different from its high performance counterpart because of the need to consider (i) low power and (ii) real-time operations. This implies the design objective for embedded many-core processors cannot be to simply maximize performance, but improve it in such a way that overall power dissipation is minimized and all real-time constraints are met. This necessitates additional power estimation models right at the design stage to accurately measure the cost and reliability of all the candidate designs during the exploration phase. In this dissertation, a statistical machine learning (SML) based design exploration framework is presented which employs an execution-driven cycle- accurate simulator to accurately measure power and performance of embedded many-core processors. The embedded many-core processor domain is Network Processors (NePs) used to processed network IP packets. Future generation NePs required to operate at terabits per second network speeds captures all the aspects of a complex embedded application consisting of shared data structures, large volume of compute-intensive and data-intensive real-time bound tasks and a high level of task (packet) level parallelism. Statistical machine learning (SML) is used to efficiently model performance and power of candidate designs in terms of wide ranges of micro-architectural parameters. The method inherently minimizes number of in-loop simulations in the exploration framework and also efficiently captures the non-linear interactions between the micro-architectural design parameters. To ensure scalability, the design space is partitioned into (i) core-level micro-architectural parameters to optimize single core architectures subject to the real-time constraints and (ii) shared memory level micro- architectural parameters to explore the shared interconnection network and shared cache memory architectures and achieves overall optimality. The cost function of our exploration algorithm is the total power dissipation which is minimized, subject to the constraints of real-time throughput (as determined from the terabit optical network router line-speed) required in IP packet processing embedded application
    corecore