98 research outputs found
Energy efficient enabling technologies for semantic video processing on mobile devices
Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This
thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the
human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and
reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing
any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art
Energy efficient hardware acceleration of multimedia processing tools
The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores.
To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature.
The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings
Parallelism and the software-hardware interface in embedded systems
This thesis by publications addresses issues in the architecture and microarchitecture of next generation, high performance streaming Systems-on-Chip through quantifying the most important forms of parallelism in current and emerging embedded system workloads. The work consists of three major research tracks, relating to data level parallelism, thread level parallelism and the software-hardware interface which together reflect the research interests of the author as they have been formed in the last nine years. Published works confirm that parallelism at the data level is widely accepted as the most important performance leverage for the efficient execution of embedded media and telecom applications and has been exploited via a number of approaches the most efficient being vectorlSIMD architectures. A further, complementary and substantial form of parallelism exists at the thread level but this has not been researched to the same extent in the context of embedded workloads. For the efficient execution of such applications, exploitation of both forms of parallelism is of paramount importance. This calls for a new architectural approach in the software-hardware interface as its rigidity, manifested in all desktop-based and the majority of embedded CPU's, directly affects the performance ofvectorized, threaded codes. The author advocates a holistic, mature approach where parallelism is extracted via automatic means while at the same time, the traditionally rigid hardware-software interface is optimized to match the temporal and spatial behaviour of the embedded workload. This ultimate goal calls for the precise study of these forms of parallelism for a number of applications executing on theoretical models such as instruction set simulators and parallel RAM machines as well as the development of highly parametric microarchitectural frameworks to encapSUlate that functionality.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
R-DVB: Software Defined Radio implementation of DVB-T signal detection functions for digital terrestrial television
This thesis describes the implementation steps of ETSI DVB-T compliant software defined radio bench receiver, using the GNU Radio framework.
It also analyzes its performances and suggest futures optimization tasks in order to achieve the real-time goal
Characterization, modeling and simulation of 4H-SiC power diodes
2009 - 2010Exploring the attractive electrical properties of the Silicon Carbide (SiC) for power devices, the
characterization and the analysis of 4H-SiC pin diodes is the main topic of this Ph.D. document. In
particular, the thesis concerns the development of an auto consistent, analytical, physics based
model, created for accurately replicating the power diodes behavior, including both on-state and
transient conditions.
At the present, the fabrication of SiC devices with the given performances is not completely
obvious because of the lack of knowledge still existing in the physical properties of the material,
especially of those related to carrier transport and of their dependences on process parameters.
Among these, one can cite the degree of doping activation, the carrier lifetime into epitaxial layers
that will be employed and the sensitivity of some physical parameters to temperature changes.
Therefore, a set of investigative tools, designed especially for SiC devices, cannot be regarded as
secondary objective. It will be useful both for process monitoring, becoming essential to the tuning
of technological processes used for the implementation of the final devices, and for a proper
diagnostics of the realized devices. Following this need, in our research activity firstly a predictive,
static analytical model, including temperature dependence, is developed. It is able to explain the
carrier transport in diffused regions as function of the injection level and turns also useful for better
understanding the influence of physical parameters, which depend in a significant way from the
processed material, on device performances. The model solves the continuity equation in double
carrier conditions, taking into account the effects due to varying doping profile of the junction, the
spatial dependence of physical parameters on both doping and injection level and the modification
of the electric field of the region with the injection regime. The model includes also the device
characterization at high temperatures to analyze the influence of thermal issues on the overall
behavior up to temperature of 250°C. The accuracy of the static model has been extensively
demonstrated by numerous comparisons with numerical results obtained by the SILVACO
commercial simulator.
Secondly, with the aim to properly account for the dynamic electrical behavior of a diode with
generic structure, the static model has been incorporated in a more general, self-consistent model,
allowing the analysis of the device behavior when it is switched from an arbitrary forward-bias
condition. In particular, the attention is focused on an abrupt variation of diode voltage due to an
instantaneous interruption of the conduction current: although this situation is notably interesting
for the study of the switching behavior of diodes, the voltage transitory is also traditionally used in
different techniques of investigation to extract more information about the mean carrier lifetime.
This occurs, for example, in the conventional Open Circuit Voltage Decay (OCVD) technique,
where the voltage decay due to the current interruption is useful for an indirect measure of minority
carrier lifetime in the epitaxial layer.
Because of its heavy dependence on processes, the carrier lifetime is an important parameter to
be monitored, especially in the case of bipolar devices, and it cannot be neglected. Due to the
existent uncertainty about this parameter in SiC epi-layers, the OCVD method reveals itself a
practical way to overcoming this limit.
In detail, by using our self-consistent model, that exploits an improved method of the traditional
OCVD technique, it is possible to characterize the carrier lifetime into 4H-SiC epitaxial layer of a
generic diode under test, obtaining the spatial distributions of the minority carrier concentration and
carrier lifetime at any injection regime. The overall model performances are compared to both
device simulations and experimental results performed on Si and 4H-SiC rectifier structures with
various physical and electrical characteristics. From the comparisons, the model results to have
good predictive capabilities for describing the spatialâtemporal variation of carriers and currents
along the whole epi-layer, proving contextually the validity of the used approximations and
allowing also to resolve some ambiguities reported in the literature, such as the stated
inapplicability of the OCVD method on thick epitaxial layers, the reasons of the observed non linear
decay of the voltage with time, and the effects of junction properties on voltage transient.
Finally, with the imposition of right boundary conditions, it is possible to use the versatility of
the developed model for extending the analysis and obtaining a physical insight of any arbitrary switching condition of 4H-SiC power diodes. [edited by author]IX n.s
Exploration and Design of Power-Efficient Networked Many-Core Systems
Multiprocessing is a promising solution to meet the requirements of near future applications. To get full benefit from parallel processing, a manycore system needs efficient, on-chip communication architecture. Networkon- Chip (NoC) is a general purpose communication concept that offers highthroughput, reduced power consumption, and keeps complexity in check by a regular composition of basic building blocks. This thesis presents power efficient communication approaches for networked many-core systems. We address a range of issues being important for designing power-efficient manycore systems at two different levels: the network-level and the router-level.
From the network-level point of view, exploiting state-of-the-art concepts such as Globally Asynchronous Locally Synchronous (GALS), Voltage/ Frequency Island (VFI), and 3D Networks-on-Chip approaches may be a solution to the excessive power consumption demanded by todayâs and future many-core systems. To this end, a low-cost 3D NoC architecture, based on high-speed GALS-based vertical channels, is proposed to mitigate high peak temperatures, power densities, and area footprints of vertical interconnects in 3D ICs. To further exploit the beneficial feature of a negligible inter-layer distance of 3D ICs, we propose a novel hybridization scheme for inter-layer communication. In addition, an efficient adaptive routing algorithm is presented which enables congestion-aware and reliable communication for the hybridized NoC architecture. An integrated monitoring and management platform on top of this architecture is also developed in order to implement more scalable power optimization techniques.
From the router-level perspective, four design styles for implementing power-efficient reconfigurable interfaces in VFI-based NoC systems are proposed. To enhance the utilization of virtual channel buffers and to manage their power consumption, a partial virtual channel sharing method for NoC routers is devised and implemented.
Extensive experiments with synthetic and real benchmarks show significant power savings and mitigated hotspots with similar performance compared to latest NoC architectures. The thesis concludes that careful codesigned elements from different network levels enable considerable power savings for many-core systems.Siirretty Doriast
Performance and Energy Consumption Characterization and Modeling of Video Decoding on Multi-core Heterogenous SoC and their Applications
To meet the increasing complexity of mobile multimedia applications, the System on Chip (SoC) equipping modern mobile devices integrate powerful heterogeneous processing elements among which General Purpose Processors (GPP), Digital Signal Processors (DSP), hardware accelerator are the most common ones.Due to the ever-growing gap between battery lifetime and hardware/software complexity in addition to application computing power needs, the energy saving issue becomes crucial in the design of such systems. In this context, we propose a study aiming to enhance the understanding of the energy consumption behavior of video decoding on these kinds of systems. Accordingly, an end-to-end methodology for characterizing and modeling the performance and the energy consumption of video decoding on GPP and DSP is proposed. The characterization step is based on an exhaustive experimental methodology for evaluating, at different abstraction levels, the performance and the energy consumption of video decoding. It was achieved on embedded platforms on which were executed a wide range of video decoding configurations. This step highlighted the importance to consider different parameters which may pertain to different abstraction levels in evaluating the overall energy efficiency of a given system. The measurements obtained in this step were used to build empirically performance and energy models for video decoding on both GPP and DSP. The proposed models gave very accurate estimation (R 2 = 97%) of both the performance and the energy consumption of video decoding in terms of a rich set of parameters including the video quality and the processor frequency. Moreover, based on a multi-level characterization and sub-model decomposition approaches, we show how the developed models, unlike classic empirical models, are easily and rapidly generalizable to other platforms.Some possible applications using the developed models, in the context of adaptive video decoding, were proposed. In general, it consists to use the capability of the proposed performance model to predict the decoding time of a given video quality in dimensioning/scheduling the processing resources. Due to the increasing demand on High Definition (HD), the characterization methodology was extended to consider HD video decoding on both parallel multi-cores and hardware video accelerator. This part highlighted the potential of parallelism video decoding to increase the energy efficiency of video decoding and point out some open issues in this domain.Pour rĂ©pondre Ă la complexitĂ© croissante des applications multimĂ©dia mobiles, les systĂšmes sur puce Ă©quipant les appareils mobiles modernes intĂšgrent des unitĂ©s de calcul puissantes et hĂ©tĂ©rogĂšne. Parmi ces units de calcul, on peut trouver des processeurs Ă usage gĂ©nĂ©ral, des processeur de traitement de signal et des accĂ©lĂ©rateurs matĂ©riels. En raison de lâĂ©cart toujours croissant entre la durĂ©e de vie des batteries et la demande de plus en plus importante en puissance de calcul, lâĂ©conomie dâĂ©nergie devient un enjeu crucial dans la conception des systĂšmes mobiles. Cette problĂ©matique est accentuĂ©e par lâaugmentation de la complexitĂ© des logiciels et architectures matĂ©riels utilisĂ©s. Dans ce contexte, nous proposons une Ă©tude visant Ă amĂ©liorer la comprĂ©hension des considĂ©rations Ă©nergĂ©tiques du dĂ©codage vidĂ©o sur ce genre de systĂšmes. Nous proposerons ainsi une mĂ©thodologie pour la caractĂ©risation et la modĂ©lisation des performances et de la consommation dâĂ©nergie du dĂ©codage vidĂ©o, aussi bien sur des processeurs Ă usage gĂ©nĂ©ral de type ARM que sur un processeurde traitement de signal. LâĂ©tape de caractĂ©risation est basĂ©e sur une mĂ©thodologie expĂ©rimentale pour Ă©valuer de façon exhaustive et Ă diffĂ©rents niveaux dâabstraction, les performances et la consommation dâĂ©nergie du dĂ©codage vidĂ©o. Cette caractĂ©risation a Ă©tĂ© rĂ©alisĂ©e sur des plates-formes embarquĂ©es sur lesquels ont Ă©tĂ© exĂ©cutĂ©s un large Ă©ventail de configurations du dĂ©codage vidĂ©o. Cette Ă©tape a soulignĂ© lâimportance dâexaminer diffĂ©rents paramĂštres qui peuvent se rapporter Ă diffĂ©rents niveaux dâabstraction dans lâĂ©valuation de lâefficacitĂ© Ă©nergĂ©tique globale dâun systĂšme donnĂ©. Les mesures obtenues dans cette Ă©tape ont Ă©tĂ© utilisĂ©es pour construire empiriquement des modĂšles de performance et de consommation dâĂ©nergie pour le dĂ©codage vidĂ©o Ă la fois sur des processeurs Ă usage gĂ©nĂ©ral type ARM et sur un processeur de traitement de signal. Les modĂšles proposĂ©s peuvent estimer avec une grande prĂ©cision (R 2 = 97%) la performance et la consommation dâĂ©nergie de dĂ©codage vidĂ©o en fonction dâun nombre de paramĂštres comprenant la qualitĂ© de la vidĂ©o et la frĂ©quence du processeur. En plus, en se basant sur une caractĂ©risation multi-niveaux et une approches de modĂ©lisation par dĂ©composition en sous-modĂšles, nous montrons comment les modĂšles dĂ©veloppĂ©s, contrairement aux modĂšles empiriques classiques, sont facilement et rapidement gĂ©nĂ©ralisables Ă dâautres plates-formes. Nous proposerons Ă©galement certaines applications possibles des modĂšles dĂ©veloppĂ©s, dans le cadre du dĂ©codage vidĂ©o adaptatif. En gĂ©nĂ©ral, cela consiste Ă exploiter la capacitĂ© du modĂšle de performance proposĂ© pour prĂ©dire le temps de dĂ©codage dâune qualitĂ© vidĂ©o donnĂ©e afin de mieux dimensionner les ressources de calculs dans un but de rĂ©duire leur consommationdâĂ©nergie
Recommended from our members
Efficient FPGA implementation and power modelling of image and signal processing IP cores
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage
and signal processing application areas such as consumer electronics, instrumentation,
medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA
devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the
work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of
cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area.
A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM
is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed
- âŠ