100 research outputs found

    Energy Saving and Virtualization Technologies in Switching

    Get PDF
    Switching is the key functionality for many devices like electronic Router and Switch, optical Router, Network on Chips (NoCs) and so on. Basically, switching is responsible for moving data unit from one port/location to another (or multiple) port(s)/location(s). In past years, the high capacity, low delay were the main concerns when designing high-end switching unit. As new demands, requests and technologies emerge, flexibility and low power cost switching design become to weight the same as throughput and delay. On one hand, highly flexible (i.e, programming ability) switching can cope with variable needs stem from new applications (i.e, VoIP) and popular user behavior (i.e, p2p downloading); on the other hand, reduce the energy and power dissipation for switching could not only save bills and build echo system but also expand components life time. Many research efforts have been devoted to increase switching flexibility and reduce its power cost. In this thesis work, we consider to exploit virtualization as the main technique to build flexible software router in the first part, then in the second part we draw our attention on energy saving in NoC (i.e, a switching fabric designed to handle the on chip data transmission) and software router. In the first part of the thesis, we consider the virtualization inside Software Routers (SRs). SR, i.e, routers running in commodity Personal Computers (PCs), become an appealing solution compared to traditional Proprietary Routing Devices (PRD) for various reasons such as cost (the multi-vendor hardware used by SRs can be cheap, while the equipment needed by PRDs is more expensive and their training cost is higher), openness (SRs can make use of a large number of open source networking applications, while PRDs are more closed) and flexibility. The forwarding performance provided by SRs has been an obstacle to their deployment in real networks. For this reason, we proposed to aggregate multiple routing units that form an powerful SR known as the Multistage Software Router (MSR) to overcome the performance limitation for a single SR. Our results show that the throughput can increase almost linearly as the number of the internal routing devices. But some other features related to flexibility (such as power saving, programmability, router migration or easy management) have been investigated less than performance previously. We noticed that virtualization techniques become reality thanks to the quick development of the PC architectures, which are now able to easily support several logical PCs running in parallel on the same hardware. Virtualization could provide many flexible features like hardware and software decoupling, encapsulation of virtual machine state, failure recovery and security, to name a few. Virtualization permits to build multiple SRs inside one physical host and a multistage architecture exploiting only logical devices. By doing so, physical resources can be used in a more efficient way, energy savings features (switching on and off device when needed) can be introduced and logical resources could be rented on-demand instead of being owned. Since virtualization techniques are still difficult to deploy, several challenges need to be faced when trying to integrate them into routers. The main aim of the first part in this thesis is to find out the feasibility of the virtualization approach, to build and test virtualized SR (VSR), to implement the MSR exploiting logical, i.e. virtualized, resources, to analyze virtualized routing performance and to propose improvement techniques to VSR and virtual MSR (VMSR). More specifically, we considered different virtualization solutions like VMware, XEN, KVM to build VSR and VMSR, being VMware a closed source solution but with higher performance and XEN/KVM open source solutions. Firstly we built and tested each single component of our multistage architecture (i.e, back-end router, load balancer )inside the virtual infrastructure, then and we extended the performance experiments with more complex scenarios like multiple Back-end Router (BR) or Load Balancer (LB) which cooperate to route packets. Our results show that virtualization could introduce 40~\% performance penalty compare with the hardware only solution. Keep the performance limitation in mind, we developed the whole VMSR and we obtained low throughput with 64B packet flow as expected. To increase the VMSR throughput, two directions could be considered, the first one is to improve the single component ( i.e, VSR) performance and the other is to work from the topology (i.e, best allocation of the VMs into the hardware ) point of view. For the first method, we considered to tune the VSR inside the KVM and we studied closely such as Linux driver, scheduler, interconnect methodology which could impact the performance significantly with proper configuration; then we proposed two ways for the VMs allocation into physical servers to enhance the VMSR performance. Our results show that with good tuning and allocation of VMs, we could minimize the virtualization penalty and get reasonable throughput for running SRs inside virtual infrastructure and add flexibility functionalities into SRs easily. In the second part of the thesis, we consider the energy efficient switching design problem and we focus on two main architecture, the NoC and MSR. As many research works suggest, the energy cost in the Communication Technologies ( ICT ) is constantly increasing. Among the main ICT sectors, a large portion of the energy consumption is contributed by the telecommunication infrastructure and their devices, i.e, router, switch, cell phone, ip TV settle box, storage home gateway etc. More in detail, the linecards, links, System on Chip (SoC) including the transmitter/receiver on these variate devices are the main power consuming units. We firstly present the work on the power reduction of the data transmission in SoC, which is carried out by the NoC. NoC is an approach to design the communication subsystem between different Processing Units (PEs) in a SoC. PEs could be different elements such as CPU, memory, digital signal/analog signal processor etc. Different PEs performs specific tasks depending on the applications running on the chip. Different tasks need to exchange data information among each other, thus flits ( chopped packet with limited header information ) are generated by PEs. The flits are injected into the NoC by the proper interface and routed until reach the destination PEs. For the whole procedure, the NoC behaves as a packet switch network. Studies show that in general the information processing in the PEs only consume 60~\% energy while the remaining 40~\% are consumed by the NoC. More importantly, as the current network designing principle, the NoC capacity is devised to handle the peak load. This is a clear sign for energy saving when the network load is low. In our work, we considered to exploit Dynamic Voltage and Frequency Scaling (DVFS) technique, which can jointly decrease or increase the system voltage and frequency when necessary, i.e, decrease the voltage and frequency at low load scenario to save energy and reduce power dissipation. More precisely, we studied two different NoC architectures for energy saving, namely single plane chip and multi-plane chip architecture. In both cases we have a very strict constraint to be that all the links and transmitter/receivers on the same plane work at the same frequency/voltage to avoid synchronization problem. This is the main difference with many existing works in the literature which usually assume different links can work at different frequency, that is hard to be implemented in reality. For the single plane NoC, we exploited different routing schemas combined with DVFS to reduce the power for the whole chip. Our results haven been compared with the optimal value obtained by modeling the power saving formally as a quadratic programming problem. Results suggest that just by using simple load balancing routing algorithm, we can save considerable energy for the single chip NoC architecture. Furthermore, we noticed that in the single plane NoC architecture, the bottleneck link could limit the DVFS effectiveness. Then we discovered that multiplane NoC architecture is fairly easy to be implemented and it could help with the energy saving. Thus we focus on the multiplane architecture and we found out that DVFS could be more efficient when we concentrate more traffic into one plane and send the remaining flows to other planes. We compared load concentration and load balancing with different power modeling and all simulation results show that load concentration is better compared with load balancing for multiplan NoC architecture. Finally, we also present one of the the energy efficient MSR design technique, which permits the MSR to follow the day-night traffic pattern more efficiently with our on-line energy saving algorithm

    Exploiting Properties of CMP Cache Traffic in Designing Hybrid Packet/Circuit Switched NoCs

    Get PDF
    Chip multiprocessors with few to tens of processing cores are already commercially available. Increased scaling of technology is making it feasible to integrate even more cores on a single chip. Providing the cores with fast access to data is vital to overall system performance. When a core requires access to a piece of data, the core's private cache memory is searched first. If a miss occurs, the data is looked up in the next level(s) of the memory hierarchy, where often one or more levels of cache are shared between two or more cores. Communication between the cores and the slices of the on-chip shared cache is carried through the network-on-chip(NoC). Interestingly, the cache and NoC mutually affect the operation of each other; communication over the NoC affects the access latency of cache data, while the cache organization generates the coherence and data messages, thus affecting the communication patterns and latency over the NoC. This thesis considers hybrid packet/circuit switched NoCs, i.e., packet switched NoCs enhanced with the ability to configure circuits. The communication and performance benefit that come from using circuits is predicated on amortizing the time cost incurred for configuring the circuits. To address this challenge, NoC designs are proposed that take advantage of properties of the cache traffic, namely temporal locality and predictability, to amortize or hide the circuit configuration time cost. First, a coarse-grained circuit configuration policy is proposed that exploits the temporal locality in the cache traffic to periodically configure circuits for the heavily communicating nodes. This allows the design of a locality-aware cache that promotes temporal communication locality through data placement, while designing suitable data replacement and migration policies. Next, a fine-grained configuration policy, called Déjà Vu switching, is proposed for leveraging predictability of data messages by initiating a circuit configuration as soon as a cache hit is detected and before the data becomes available. Its benefit is demonstrated for saving interconnect energy in multi-plane NoCs. Finally, a more proactive configuration policy is proposed for fast caches, where circuit reservations are initiated by request messages, which can greatly improve communication latency and system performance

    Development and implementation of quadratically distorted (QD) grating and grisms system for 4D multi-colour microscopy imaging (MCMI)

    Get PDF
    The recent emergence of super-resolution microscopy imaging techniques has surpassed the diffraction limit to improve image resolution. Contrary to the breakthroughs of spatial resolution, high temporal resolution remains a challenge. This dissertation demonstrates a simple, on axis, 4D (3D + time) multi-colour microscopy imaging (MCMI) technology that delivers simultaneous 3D broadband imaging over cellular volumes, which is especially applicable to the real-time imaging of fast moving biospecimens. Quadratically distorted (QD) grating, in the form of an off axis-Fresnel zone plate, images multiple object planes simultaneously on a single image plane. A delicate mathematical model of 2D QD grating has been established and implemented in the design and optimization of QD grating. Grism, a blazed grating and prism combination, achieves chromatic control in the 4D multi-plane imaging. A pair of grisms, whose separation can be varied, provide a collimated beam with a tuneable chromatic shear from a collimated polychromatic input. The optical system based on QD grating and grisms has been simply appended to the camera port of a commercial microscope, and a few bioimaging tests have been performed, i.e. the 4D chromatically corrected imaging of fluorescence microspheres, MCF-7 and HeLa cells. Further investigation of bioimaging problems is still in progress

    LightSpeed: Light and Fast Neural Light Fields on Mobile Devices

    Full text link
    Real-time novel-view image synthesis on mobile devices is prohibitive due to the limited computational power and storage. Using volumetric rendering methods, such as NeRF and its derivatives, on mobile devices is not suitable due to the high computational cost of volumetric rendering. On the other hand, recent advances in neural light field representations have shown promising real-time view synthesis results on mobile devices. Neural light field methods learn a direct mapping from a ray representation to the pixel color. The current choice of ray representation is either stratified ray sampling or Plucker coordinates, overlooking the classic light slab (two-plane) representation, the preferred representation to interpolate between light field views. In this work, we find that using the light slab representation is an efficient representation for learning a neural light field. More importantly, it is a lower-dimensional ray representation enabling us to learn the 4D ray space using feature grids which are significantly faster to train and render. Although mostly designed for frontal views, we show that the light-slab representation can be further extended to non-frontal scenes using a divide-and-conquer strategy. Our method offers superior rendering quality compared to previous light field methods and achieves a significantly improved trade-off between rendering quality and speed.Comment: Project Page: http://lightspeed-r2l.github.io/ . Add camera ready versio

    Tolerating Errors in NoC: A Lightweight Region-Based Fault-Mitigation Method

    Get PDF
    International audienceDue to transistor shrinking and core number increasing in System-on-Chip (SoC), fault tolerance has become essential. Faults occurring to Network-on-Chips (NoCs) of those systems have a significant impact, due to the high amount of data crossing the NoC for communication. However, existing fault correction approaches cannot efficiently address several permanent faults on NoC NoC, due to their high hardware costs. To mitigate the impact of faults, existing works shuffle the bits inside a flit, transferring the impact of faults on the least significant bits. However, such approaches are applied at a fine-grained level, providing fault mitigation efficiency but with significant hardware costs. To address this limitation, this work proposes a region-based bit-shuffling technique, applied at a coarse-grain level, that trades off fault mitigation efficiency in order to save hardware costs

    Study of the effects of spherical aberration and signal levels on a diffraction-based multiplane microscope and its application to evaluate the fluid shear stress around a cell

    Get PDF
    Multifocal/multiplane microscopy (MUM) is a technique to acquire simultaneously several planes at sample and obtain axially extended 4D imaging. This is an important characteristic that allows to track fast single molecules/particles three-dimensionally, in real time and over wide axial ranges (≈ 8 µm). MUM avoids possible ambiguous localisations due to the scanning of the imaged plane to acquire a 3D volume over time. For this thesis, a diffraction multiplane system has been characterised to evaluate the impact of different levels of spherical aberration and signal and applied to measure velocity and shear stress fields due to the flow of a liquid around a cell. The spherical aberration has been quantified via the curves of sharpness that can measure the amount of aberrations in images. This has shown that the measured plane spacing grows as the spherical aberration increases. The influence of spherical aberration on image sharpness as a function of emitter axial position could potentially be used to generate correction factors and improve the accuracy on the recovered positions. In terms of performance, the axial range over which the expected axial positions can be calculated with accuracies of at least 100 nm has been shown to vary linearly with the signal level in the studied range. The signal to noise ratio (SNR) threshold below which the axial range goes to 0 µm has been calculated to be 1.23 ± 0.71. It has also been demonstrated that the axial range can be potentially raised by enlarging the plane spacing. Regarding the precision on the axial positions, this varies exponentially with the signal with a decay constant of 0.51 ± 0.10 per SNR unit. This work has generated two equations to predict the expected axial range and precision, given the system parameters are known. Concerning its applications, MUM has been tested to perform micro-particle image velocimetry (µPIV), a technique able to reconstruct the velocity and shear stress fields imposed by a liquid flowing around a cell. The system has been, first, tested in absence of cells, achieving, within 10 µm from the coverslip glass, an accuracy on the calculated velocity of (0.42 ± 0.32) µm/s. This value is slightly worse than that obtained by using a confocal microscope, which is (0.30 ± 0.13) µm/s. Above 10 µm, instead, MUM performance is considerably inferior than that reached with the confocal microscope. In presence of cells MUM has been used for the first time to capture the perturbations to the expected laminar flow, allowing to measure velocities of 30 µm/s and shear stresses of 3 Pa around the observed cell. The reconstructed fields show characteristics similar to those reported in the literature. However, the observation of unexpected velocity and shear stress values indicate a reduction in accuracy caused by false axial localisations

    Tools for interfacing, extracting, and analyzing neural signals using wide-field fluorescence imaging and optogenetics in awake behaving mice

    Get PDF
    Imaging of multiple cells has rapidly multiplied the rate of data acquisition as well as our knowledge of the complex dynamics within the mammalian brain. The process of data acquisition has been dramatically enhanced with highly affordable, sensitive image sensors enable high-throughput detection of neural activity in intact animals. Genetically encoded calcium sensors deliver a substantial boost in signal strength and in combination with equally critical advances in the size, speed, and sensitivity of image sensors available in scientific cameras enables high-throughput detection of neural activity in behaving animals using traditional wide-field fluorescence microscopy. However, the tremendous increase in data flow presents challenges to processing, analysis, and storage of captured video, and prompts a reexamination of traditional routines used to process data in neuroscience and now demand improvements in both our hardware and software applications for processing, analyzing, and storing captured video. This project demonstrates the ease with which a dependable and affordable wide-field fluorescence imaging system can be assembled and integrated with behavior control and monitoring system such as found in a typical neuroscience laboratory. An Open-source MATLAB toolbox is employed to efficiently analyze and visualize large imaging data sets in a manner that is both interactive and fully automated. This software package provides a library of image pre-processing routines optimized for batch-processing of continuous functional fluorescence video, and additionally automates a fast unsupervised ROI detection and signal extraction routine. Further, an extension of this toolbox that uses GPU programming to process streaming video, enabling the identification, segmentation and extraction of neural activity signals on-line is described in which specific algorithms improve signal specificity and image quality at the single cell level in a behaving animal. This project describes the strategic ingredients for transforming a large bulk flow of raw continuous video into proportionally informative images and knowledge

    Laguerre-Gaussian mode sorter

    Full text link
    Light's spatial properties represent an infinite state space, making it attractive for applications requiring high dimensionality, such as quantum mechanics and classical telecommunications, but also inherently spatial applications such as imaging and sensing. However, there is no demultiplexing device in the spatial domain comparable to a grating or calcite for the wavelength and polarisation domains respectively. Specifically, a simple device capable of splitting a finite beam into a large number of discrete spatially separated spots each containing a single orthogonal spatial component. We demonstrate a device capable of decomposing a beam into a Cartesian grid of identical Gaussian spots each containing a single Laguerre-Gaussian component. This is the first device capable of decomposing the azimuthal and radial components simultaneously, and is based on a single spatial light modulator and mirror. We demonstrate over 210 spatial components, meaning it is also the highest dimensionality mode multiplexer of any kind

    Compact realizations of optical super-resolution microscopy for the life sciences

    Get PDF
    Sandmeyer A. Compact realizations of optical super-resolution microscopy for the life sciences. Bielefeld: Universität Bielefeld; 2019

    Contributing to Second Harmonic Manipulated Continuum Mode Power Amplifiers and On-Chip Flux Concentrators

    Get PDF
    The current cellular network consumes a staggering 100 TWh of energy every year. In the coming years, millions of devices will be added to the existing network to realize the Internet of Things (IoT), further increasing its power consumption. An RF power amplifier typically consumes a large proportion of the DC power in a wireless transceiver, improving its efficiency has the largest impact on the overall system. Additionally, amplifiers need to demonstrate high linearity and bandwidth to adhere to constraints imposed by wireless standards and to reduce the number of amplifiers required as an amplifier with a broader bandwidth can potentially replace several narrowband amplifiers. A typical approach to improve efficiency is to present an appropriate load at the harmonics generated by the transistor. Recently proposed continuous modes based on harmonic manipulation, such as class B/J continuum, continuous class F (CCF) and continuous class F-1 (CCF-1), have shown the capability of achieving counteracting requirements viz., high efficiency, high linearity, and broad bandwidth (with a fractional bandwidth greater than 30%). In these classes of amplifiers, the second harmonic is manipulated by placing a reactive second harmonic load and the reactive component of the fundamental load is adjusted while keeping a fixed resistive component of the fundamental load. The first contribution of this work is to investigate the reason for amplifiers designed in classes B/J continuum and CCF to achieve high efficiency at back-off and 1dB compression. In this thesis, we demonstrate that the variation of the phase of the current through the non-linear intrinsic capacitances due to the variation of the phase in the continuum of drain voltage waveforms in Class B/J/J* continuum leads to either a reduction or enhancement of intrinsic drain current. Consequently, a subset of voltage waveforms of the class B/J/J* continuum can be used to design amplifiers with higher P1dB, and efficiency at P1dB than in Class B. A simple choice of this subset is demonstrated with a 2.6GHz Class B/J/J* amplifier, achieving a P1dB of 38.1dBm and PAE at P1dB of 54.7%, the highest output power and efficiency at P1dB amongst narrowband linear amplifiers using the CGH40010 reported to date, at a comparable peak PAE of 72%. Secondly, we propose a new formulation for high-efficiency modes of power amplifiers in which both the in-phase and out-of-phase components of the second harmonic of the current are varied, in addition to the second harmonic component of the voltage. A reduction of the in-phase component of the second harmonic of current allows reduction of the phase difference between the voltage and current waveforms, thereby increasing the power factor and efficiency. Our proposed waveforms offer a continuous design space between class B/J continuum and continuous F-1 achieving an efficiency of up to 91% in theory, but over a wider set of load impedances than continuous class F-1. These waveforms require a short at third and higher harmonic impedances, which are easier to achieve at a higher frequency. The load impedances at the second harmonic are reactive and can be of any value between -j∞ and j∞, easing the amplifier design. A trade-off between linearity and efficiency exists in the newly proposed broadband design space, but we demonstrate inherent broadband capability. The fabricated narrowband amplifier using a GaN HEMT CGH40010F demonstrates 75.9% PAE and 42.2 dBm output power at 2.6 GHz, demonstrating a comparable frequency weighted efficiency for this device to that reported in the literature. IoT devices may be deployed in critical applications such as radar or 5G transceivers of an autonomous vehicle and hence need to operate free of failure. Monitoring the drain current of the RF GaN MMIC would allow to optimize the device performance and protect it from surges in its supply current. Galvanic current sensors rely on the magnetic field generated by the current as a non-invasive method of current sensing. In this thesis, our third major contribution is a planar on-chip magnetic flux concentrator, is enhance the magnetic field at the current sensor, thereby improving the current detection capability of a current sensor. Our layout utilizes a discontinuity in a magnetic via, resulting in penetration of the magnetic field into the substrate. The proposed concentrator has a magnetic gain x1.8 in comparison to air. The permeability of the magnetic core required is 500, much lower than that reported in off-chip concentrators, resulting in a significant easing of the specifications of the material properties of the core. Additionally, we explore a novel three-dimensional spiral-shaped magnetic flux concentrator. It is predicted via simulations that this geometry becomes a necessity to enhance the magnetic field for increased form factor as the magnetic field from a single planar concentrator deteriorates as its size increases
    corecore