111 research outputs found
Symmetric rearrangeable networks and algorithms
A class of symmetric rearrangeable nonblocking networks has been considered in this thesis. A particular focus of this thesis is on Benes networks built with 2 x 2 switching elements. Symmetric rearrangeable networks built with larger switching elements have also being considered. New applications of these networks are found in the areas of System on Chip (SoC) and Network on Chip (NoC). Deterministic routing algorithms used in NoC applications suffer low scalability and slow execution time. On the other hand, faster algorithms are blocking and thus limit throughput. This will be an acceptable trade-off for many applications where achieving ”wire speed” on the on-chip network would require extensive optimisation of the attached devices. In this thesis I designed an algorithm that has much lower blocking probabilities than other suboptimal algorithms but a much faster execution time than deterministic routing algorithms. The suboptimal method uses the looping algorithm in its outermost stages and then in the two distinct subnetworks deeper in the switch uses a fast but suboptimal path search method to find available paths. The worst case time complexity of this new routing method is O(NlogN) using a single processor, which matches the best known results reported in the literature.
Disruption of the ongoing communications in this class of networks during rearrangements is an open issue. In this thesis I explored a modification of the topology of these networks which gives rise to what is termed as repackable networks. A repackable topology allows rearrangements of paths without intermittently losing connectivity by breaking the existing communication paths momentarily. The repackable network structure proposed in this thesis is efficient in its use of hardware when compared to other proposals in the literature.
As most of the deterministic algorithms designed for Benes networks implement a permutation of all inputs to find the routing tags for the requested inputoutput pairs, I proposed a new algorithm that can work for partial permutations. If the network load is defined as ρ, the mean number of active inputs in a partial permutation is, m = ρN, where N is the network size. This new method is based on mapping the network stages into a set of sub-matrices and then determines the routing tags for each pair of requests by populating the cells of the sub-matrices without creating a blocking state. Overall the serial time complexity of this method is O(NlogN) and O(mlogN) where all N inputs are active and with m < N active inputs respectively. With minor modification to the serial algorithm this method can be made to work in the parallel domain. The time complexity of this routing algorithm in a parallel machine with N completely connected processors is O(log^2 N). With m active requests the time complexity goes down to (logmlogN), which is better than the O(log^2 m + logN), reported in the literature for 2^0.5((log^2 -4logN)^0.5-logN)<= ρ <= 1. I also designed multistage symmetric rearrangeable networks using larger switching elements and implement a new routing algorithm for these classes of networks.
The network topology and routing algorithms presented in this thesis should allow large scale networks of modest cost, with low setup times and moderate blocking rates, to be constructed. Such switching networks will be required to meet the bandwidth requirements of future communication networks
Crosstalk-free Conjugate Networks for Optical Multicast Switching
High-speed photonic switching networks can switch optical signals at the rate
of several terabits per second. However, they suffer from an intrinsic
crosstalk problem when two optical signals cross at the same switch element. To
avoid crosstalk, active connections must be node-disjoint in the switching
network. In this paper, we propose a sequence of decomposition and merge
operations, called conjugate transformation, performed on each switch element
to tackle this problem. The network resulting from this transformation is
called conjugate network. By using the numbering-schemes of networks, we prove
that if the route assignments in the original network are link-disjoint, their
corresponding ones in the conjugate network would be node-disjoint. Thus,
traditional nonblocking switching networks can be transformed into
crosstalk-free optical switches in a routine manner. Furthermore, we show that
crosstalk-free multicast switches can also be obtained from existing
nonblocking multicast switches via the same conjugate transformation.Comment: 10 page
Experimental Benchmarks and Initial Evaluation of the Performance of the PASM System Prototype
The work reported here represents experiences with the PASM parallel processing system prototype during its first operational year. Most of the experiments were performed by students in the Fall semester of 1987. The first programming, and the first timing measurements, were made during the summer of 1987 by Sam Fineberg. The goal of the collection of experiments presented here was to undertake an Application-driven Architecture Study of the PASM system as a paradigm for parallel architecture evaluation in general. PASM was an excellent vehicle for experimenting with this evaluation technique due to its unique architectural features. Among these are: 1. A reconfigurable, partitionable multistage circuit-switched network. 2. Support for both SIMD and MIMD programs. 3. Ability to execute hybrid SIMD/MIMD programs. 4. An instruction queue which allows overlap of control-flow and data manipulation between micro-control (MC) units and processing elements (PE). It had been hypothesized that superlinear speed-up over the number of PEs could be attained with this feature, and experimental results verified this. 5. Support for barrier synchronization of MIMD tasks. This feature was exploited in some non-standard ways to show the ability to decouple variant length SIMD instructions into multiple MIMD streams for an overall performance benefit. This type of study is expected to continue in the future on PASM and other parallel machines at Purdue. This report should serve as a guide for this future work as well
Optical architectures for high performance switching and routing
This thesis investigates optical interconnection networks for high performance switching and routing. Two main topics are studied.
The first topic regards the use of silicon microring resonators for short reach optical interconnects. Photonic technologies can help to overcome the intrinsic limitations of electronics when used in interconnects, short-distance transmissions and switching operations. This thesis considers the peculiarasymmetric losses of microring resonators since they pose unprecedented challenges for the design of the architecture and for the routing algorithms. It presents new interconnection architectures, proposes modifications on classical routing algorithms and achieves a better performance in terms of fabric complexity and scalability with respect to the state of the art. Subsequently, this thesis considers wavelength dimension capabilities of microring resonators in which wavelength reuse (i.e. crosstalk accumulation) presents impairments on the system performance. To this aim, it presents different crosstalk reduction techniques, a feasibility analysis for the design of microring resonators and a novel wavelength-agile routing matrix.
The second topic regards flexible resource allocation with adaptable infrastructure for elastic optical networks. In particular, it focus on Architecture on Demand (AoD), whereby optical node architectures can be reconfigured on the fly according to traffic requirements. This thesis includes results on the first flexible-grid optical spectrum networking field trial, carried out in a collaboration with University of Essex. Finally, it addresses several challenges that present the novel concept AoD by means of modeling and simulation. This thesis proposes an algorithm to perform automatic architecture synthesis, reports AoD scalability and power consumption results working under the proposed synthesis algorithm. Such results validate AoD as a flexible node concept that provides power efficiency and high switching capacity
On chip interconnects for multiprocessor turbo decoding architectures
International audienc
A low-cost high-speed twin-prefetching DSP-based shared-memory system for real-time image processing applications
This dissertation introduces, investigates, and evaluates a low-cost high-speed twin-prefetching DSP-based bus-interconnected shared-memory system for real-time image processing applications. The proposed architecture can effectively support 32 DSPs in contrast to a maximum of 4 DSPs supported by existing DSP-based bus- interconnected systems. This significant enhancement is achieved by introducing two small programmable fast memories (Twins) between the processor and the shared bus interconnect. While one memory is transferring data from/to the shared memory, the other is supplying the core processor with data. The elimination of the traditional direct linkage of the shared bus and processor data bus makes feasible the utilization of a wider shared bus i.e., shared bus width becomes independent of the data bus width of the processors. The fast prefetching memories and the wider shared bus provide additional bus bandwidth into the system, which eliminates large memory latencies; such memory latencies constitute the major drawback for the performance of shared-memory multiprocessors. Furthermore, in contrast to existing DSP-based uniprocessor or multiprocessor systems the proposed architecture does not require all data to be placed on on-chip or off-chip expensive fast memory in order to reach or maintain peak performance. Further, it can maintain peak performance regardless of whether the processed image is small or large.
The performance of the proposed architecture has been extensively investigated executing computationally intensive applications such as real-time high-resolution image processing. The effect of a wide variety of hardware design parameters on performance has been examined. More specifically tables and graphs comprehensively analyze the performance of 1, 2, 4, 8, 16, 32 and 64 DSP-based systems, for a wide variety of shared data interconnect widths such as 32, 64, 128, 256 and 512. In addition, the effect of the wide variance of temporal and spatial locality (present in different applications) on the multiprocessor\u27s execution time is investigated and analyzed. Finally, the prefetching cache-size was varied from a few kilobytes to 4 Mbytes and the corresponding effect on the execution time was investigated. Our performance analysis has clearly showed that the execution time converges to a shallow minimum i.e., it is not sensitive to the size of the prefetching cache. The significance of this observation is that near optimum performance can be achieved with a small (16 to 300 Kbytes) amount of prefetching cache
Recommended from our members
Energy Efficient High Port Count Optical Switches
The advance of internet applications, such as video streaming, big data and cloud computing, is reshaping the telecommunication and internet industries. Bandwidth demands in datacentres have been boosted by these emerging data-hungry internet applications. Regarding inter- and intra-datacentre communications, fine-grained data need to be exchanged across a large shared memory space.
Large-scale high-speed optical switches tend to use a rearrangeably non-blocking architecture as this limits the number of switching elements required. However, this comes at the expense of requiring more sophisticated route selection within the switch and also some forms of time-slotted protocols. The looping algorithm is the classical routing algorithm to set up paths in rearrangeably non-blocking switches. It was born in the electronic switch era, where all links in the switches are equal. It is, therefore, not able to accommodate loss difference between optical paths due to the different length of waveguides and distinct numbers of crossings, and bends, leading to sub-optimal performance.
We, therefore, propose an advanced path-selection algorithm based on the looping algorithm that minimises the path-dependent loss. It explores all possible set-ups for a given connection assignment and selects the optimal one. It guarantees that no individual path would have a sufficiently substantial loss, therefore, improve the overall performance of the switch. The performance of the proposed algorithm has been assessed by modelling switches using the VPI simulator. An 8×8 Clos-tree switch demonstrates a 2.7dB decrease in loss and 1.9dB improvement in IPDR with 1.5 dB penalty for the worst case. An 8×8 dilated Beneš shows more than 4 dB loss reduction for the lossiest path and 1.4 dB IPDR improvement for 1 dB power penalty. The improved algorithm can be run once for each switch design and store its output in a compact lookup table, enabling rapid switch reconfiguration.
Microelectromechanical systems (MEMS) based optical switches have been fabricated with over 1,000 ports which meet the port count requirements in data centre networks. However, the reconfiguration speed of the MEMS switches is limited to the millisecond to microsecond timescale, which is not sufficient for packet switching in datacentres. Opto-electronic devices, such as Mach-Zehnder Interferometers (MZIs) and semiconductor optical amplifiers (SOAs) with nanosecond response time show the potential to fulfil the requirements of packet switching. However, the scalability of MZI switches is inherently limited by insertion loss and accumulated crosstalk, while the scalability of SOA switches is restricted by accumulated noise and distortion.
We, therefore, have proposed a dilated Beneš hybrid MZI-SOA design, where MZIs are implemented as 1×2 or 2×1 low-loss switching elements, minimising crosstalk by using a single input, and where short SOAs are included as gain or absorption units, offering either loss compensation or crosstalk suppression though adding only minimal noise and distortion. A 4×4 device has been fabricated and exhibits a mere 1.3dB loss, an extinction ratio of 47dB, and more than 13dB IPDR for a 0.5dB power penalty. When operating with 10 Gb/s per port, 6pJ/bit energy consumption is demonstrated, delivering 20% reduced energy consumption compared with SOA-based switches. The tolerance of the current control accuracy of this switch is very broad. Within a 5 mA bias current range, the power penalty can be maintained below 0.2 dB for 8 dB IPDR and 12 mA for 10 dB IPDR with a penalty less 0.5 dB. The excellent crosstalk and power penalty performance demonstrated by this chip enable the scalability of this hybrid approach. The performance of 16×16 port dilated Beneš hybrid switch is experimentally assessed by cascading 4×4 switch chips, demonstrating an IPDR of 15 dB at a 1 dB penalty with a 0.6 dB power penalty floor. In terms of switches with port count larger than 16×16, the power penalty performance has been analysed with physical layer simulations fitted with state-of-the-art data. We assess the feasibility of three potential topologies, with different architectural optimisations: dilated Beneš, Beneš and Clos-Beneš. Quantitative analysis for switches with up to 2048 ports is presented, achieving a 1.15dB penalty for a BER of 10-3, compatible with soft-decision forward error correction.Cambridge Overseas Trust; China Scholarship Council
- …