429 research outputs found
Recommended from our members
Optically-Connected Memory: Architectures and Experimental Characterizations
Growing demands on future data centers and high-performance computing systems are driving the development of processor-memory interconnects with greater performance and flexibility than can be provided by existing electronic interconnects. A redesign of the systems' memory devices and architectures will be essential to enabling high-bandwidth, low-latency, resilient, energy-efficient memory systems that can meet the challenges of exascale systems and beyond. By leveraging an optics-based approach, this thesis presents the design and implementation of an optically-connected memory system that exploits both the bandwidth density and distance-independent energy dissipation of photonic transceivers, in combination with the flexibility and scalability offered by optical networks. By replacing the electronic memory bus with an optical interconnection network, novel memory architectures can be created that are otherwise infeasible. With remote optically-connected memory nodes accessible to processors as if they are local, programming models can be designed to utilize and efficiently share greater amounts of data. Processors that would otherwise be idle, being starved for data while waiting for scarce memory resources, can instead operate at high utilizations, leading to drastic improvements in the overall system performance. This work presents a prototype optically-connected memory module and a custom processor-based optical-network-aware memory controller that communicate transparently and all-optically across an optical interconnection network. The memory modules and controller are optimized to facilitate memory accesses across the optical network using a packet-switched, circuit-switched, or hybrid packet-and-circuit-switched approach. The novel memory controller is experimentally demonstrated to be compatible with existing processor-memory access protocols, with the memory controller acting as the optics-computing interface to render the optical network transparent. Additionally, the flexibility of the optical network enables additional performance benefits including increased memory bandwidth through optical multicasting. This optically-connected architecture can further enable more resilient memory system realizations by expanding on current error dectection and correction memory protocols. The integration of optics with memory technology constitutes a critical step for both optics and computing. The scalability challenges facing main memory systems today, especially concerning bandwidth and power consumption, complement well with the strengths of optical communications-based systems. Additionally, ongoing efforts focused on developing low-cost optical components and subsystems that are suitable for computing environments may benefit from the high-volume memory market. This work therefore takes the first step in merging the areas of optics and memory, developing the necessary architectures and protocols to interface the two technologies, and demonstrating potential benefits while identifying areas for future work. Future computing systems will undoubtedly benefit from this work through the deployment of high-performance, flexible, energy-efficient optically-connected memory architectures
Design and analysis of a 3-dimensional cluster multicomputer architecture using optical interconnection for petaFLOP computing
In this dissertation, the design and analyses of an extremely scalable distributed
multicomputer architecture, using optical interconnects, that has the potential to
deliver in the order of petaFLOP performance is presented in detail. The design
takes advantage of optical technologies, harnessing the features inherent in optics,
to produce a 3D stack that implements efficiently a large, fully connected system of
nodes forming a true 3D architecture. To adopt optics in large-scale multiprocessor
cluster systems, efficient routing and scheduling techniques are needed. To this
end, novel self-routing strategies for all-optical packet switched networks and on-line
scheduling methods that can result in collision free communication and achieve real
time operation in high-speed multiprocessor systems are proposed. The system is designed
to allow failed/faulty nodes to stay in place without appreciable performance
degradation. The approach is to develop a dynamic communication environment that
will be able to effectively adapt and evolve with a high density of missing units or
nodes. A joint CPU/bandwidth controller that maximizes the resource allocation in
this dynamic computing environment is introduced with an objective to optimize the
distributed cluster architecture, preventing performance/system degradation in the
presence of failed/faulty nodes. A thorough analysis, feasibility study and description of the characteristics of a 3-Dimensional multicomputer system capable of achieving
100 teraFLOP performance is discussed in detail. Included in this dissertation is
throughput analysis of the routing schemes, using methods from discrete-time queuing
systems and computer simulation results for the different proposed algorithms. A
prototype of the 3D architecture proposed is built and a test bed developed to obtain
experimental results to further prove the feasibility of the design, validate initial assumptions,
algorithms, simulations and the optimized distributed resource allocation
scheme. Finally, as a prelude to further research, an efficient data routing strategy
for highly scalable distributed mobile multiprocessor networks is introduced
PAM Performance Analysis in Multicast-Enabled Wavelength-Routing Data Centers
Multilevel pulse amplitude modulation (M-PAM) is gaining momentum for high-capacity and power-efficient cloud computing. Compared to the classic on-off keying (OOK) modulation, high-order PAM yields better spectral efficiency but is also more susceptible to physical layer degradation effects. We develop a cross-layer analysis framework to examine the PAM transmission performance in data center network environments supporting both optical multicasting and wavelength routing. Our analysis is conducted on a switch architecture based on an arrayed-waveguide grating (AWG) core and distributed broadcast domains, exhibiting different physical paths, and random, uncontrolled crosstalk noise. Reed-Solomon coding with rate adaptation is incorporated into PAM transceivers to compensate for impairments. Our Monte Carlo simulations point to the significant impact of AWG crosstalk on higher order PAM in wavelength-reuse architectures and the importance of code rate adaptation for signals traversing multiple routing stages. According to our study, 8-PAM offers the highest effective bit rates for signals terminating in one broadcast domain and performs poorly when considering interdomain connectivity. On the other hand, the impairment-induced degradation of interdomain capacity for 4-PAM can be limited to 20.7%, making it better suited for connections spanning two broadcast domains and a crosstalk-rich stage. Our results call for software-defined PAM transceiver designs in support of both modulation order and code rate adaptation
Wireless Communication in Data Centers: A Survey
Data centers (DCs) is becoming increasingly an integral part of the computing infrastructures of most enterprises. Therefore, the concept of DC networks (DCNs) is receiving an increased attention in the network research community. Most DCNs deployed today can be classified as wired DCNs as copper and optical fiber cables are used for intra- and inter-rack connections in the network. Despite recent advances, wired DCNs face two inevitable problems; cabling complexity and hotspots. To address these problems, recent research works suggest the incorporation of wireless communication technology into DCNs. Wireless links can be used to either augment conventional wired DCNs, or to realize a pure wireless DCN. As the design spectrum of DCs broadens, so does the need for a clear classification to differentiate various design options. In this paper, we analyze the free space optical (FSO) communication and the 60 GHz radio frequency (RF), the two key candidate technologies for implementing wireless links in DCNs. We present a generic classification scheme that can be used to classify current and future DCNs based on the communication technology used in the network. The proposed classification is then used to review and summarize major research in this area. We also discuss open questions and future research directions in the area of wireless DCs
Broadcast-oriented wireless network-on-chip : fundamentals and feasibility
Premi extraordinari doctorat UPC curs 2015-2016, Ă mbit Enginyeria de les TICRecent years have seen the emergence and ubiquitous adoption of Chip Multiprocessors (CMPs), which rely on the coordinated operation of multiple execution units or cores. Successive CMP generations integrate a larger number of cores seeking higher performance with a reasonable cost envelope. For this trend to continue, however, important scalability issues need to be solved at different levels of design. Scaling the interconnect fabric is a grand challenge by itself, as new Network-on-Chip (NoC) proposals need to overcome the performance hurdles found when dealing with the increasingly variable and heterogeneous communication demands of manycore processors. Fast and flexible NoC solutions are needed to prevent communication become a performance bottleneck, situation that would severely limit the design space at the architectural level and eventually lead to the use of software frameworks that are slow, inefficient, or less programmable.
The emergence of novel interconnect technologies has opened the door to a plethora of new NoCs promising greater scalability and architectural flexibility. In particular, wireless on-chip communication has garnered considerable attention due to its inherent broadcast capabilities, low latency, and system-level simplicity. Most of the resulting Wireless Network-on-Chip (WNoC) proposals have set the focus on leveraging the latency advantage of this paradigm by creating multiple wireless channels to interconnect far-apart cores. This strategy is effective as the complement of wired NoCs at moderate scales, but is likely to be overshadowed at larger scales by technologies such as nanophotonics unless bandwidth is unrealistically improved.
This dissertation presents the concept of Broadcast-Oriented Wireless Network-on-Chip (BoWNoC), a new approach that attempts to foster the inherent simplicity, flexibility, and broadcast capabilities of the wireless technology by integrating one on-chip antenna and transceiver per processor core. This paradigm is part of a broader hybrid vision where the BoWNoC serves latency-critical and broadcast traffic, tightly coupled to a wired plane oriented to large flows of data. By virtue of its scalable broadcast support, BoWNoC may become the key enabler of a wealth of unconventional hardware architectures and algorithmic approaches, eventually leading to a significant improvement of the performance, energy efficiency, scalability and programmability of manycore chips.
The present work aims not only to lay the fundamentals of the BoWNoC paradigm, but also to demonstrate its viability from the electronic implementation, network design, and multiprocessor architecture perspectives. An exploration at the physical level of design validates the feasibility of the approach at millimeter-wave bands in the short term, and then suggests the use of graphene-based antennas in the terahertz band in the long term. At the link level, this thesis provides an insightful context analysis that is used, afterwards, to drive the design of a lightweight protocol that reliably serves broadcast traffic with substantial latency improvements over state-of-the-art NoCs. At the network level, our hybrid vision is evaluated putting emphasis on the flexibility provided at the network interface level, showing outstanding speedups for a wide set of traffic patterns. At the architecture level, the potential impact of the BoWNoC paradigm on the design of manycore chips is not only qualitatively discussed in general, but also quantitatively assessed in a particular architecture for fast synchronization. Results demonstrate that the impact of BoWNoC can go beyond simply improving the network performance, thereby representing a possible game changer in the manycore era.Avenços en el disseny de multiprocessadors han portat a una Ă mplia adopciĂł dels Chip Multiprocessors (CMPs), que basen el seu potencial en la operaciĂł coordinada de mĂșltiples nuclis de procĂ©s. Generacions successives han anat integrant mĂ©s nuclis en la recerca d'alt rendiment amb un cost raonable. Per a que aquesta tendĂšncia continuĂŻ, perĂČ, cal resoldre importants problemes d'escalabilitat a diferents capes de disseny. Escalar la xarxa d'interconnexiĂł Ă©s un gran repte en ell mateix, ja que les noves propostes de Networks-on-Chip (NoC) han de servir un trĂ fic eminentment variable i heterogeni dels processadors amb molts nuclis. SĂłn necessĂ ries solucions rĂ pides i flexibles per evitar que les comunicacions dins del xip es converteixin en el prĂČxim coll d'ampolla de rendiment, situaciĂł que limitaria en gran mesura l'espai de disseny a nivell d'arquitectura i portaria a l'Ășs d'arquitectures i models de programaciĂł lents, ineficients o poc programables. L'apariciĂł de noves tecnologies d'interconnexiĂł ha possibilitat la creaciĂł de NoCs mĂ©s flexibles i escalables. En particular, la comunicaciĂł intra-xip sense fils ha despertat un interĂšs considerable en virtut de les seva baixa latĂšncia, simplicitat, i bon rendiment amb trĂ fic broadcast. La majoria de les Wireless NoC (WNoC) proposades fins ara s'han centrat en aprofitar l'avantatge en termes de latĂšncia d'aquest nou paradigma creant mĂșltiples canals sense fils per interconnectar nuclis allunyats entre sĂ. Aquesta estratĂšgia Ă©s efectiva per complementar a NoCs clĂ ssiques en escales mitjanes, perĂČ Ă©s probable que altres tecnologies com la nanofotĂČnica puguin jugar millor aquest paper a escales mĂ©s grans. Aquesta tesi presenta el concepte de Broadcast-Oriented WNoC (BoWNoC), un nou enfoc que intenta rendibilitzar al mĂ xim la inherent simplicitat, flexibilitat, i capacitats broadcast de la tecnologia sense fils integrant una antena i transmissor/receptor per cada nucli del processador. Aquest paradigma forma part d'una visiĂł mĂ©s Ă mplia on un BoWNoC serviria trĂ fic broadcast i urgent, mentre que una xarxa convencional serviria fluxos de dades mĂ©s pesats. En virtut de la escalabilitat i del seu suport broadcast, BoWNoC podria convertir-se en un element clau en una gran varietat d'arquitectures i algoritmes poc convencionals que milloressin considerablement el rendiment, l'eficiĂšncia, l'escalabilitat i la programabilitat de processadors amb molts nuclis. El present treball tĂ© com a objectius no nomĂ©s estudiar els aspectes fonamentals del paradigma BoWNoC, sinĂł tambĂ© demostrar la seva viabilitat des dels punts de vista de la implementaciĂł, i del disseny de xarxa i arquitectura. Una exploraciĂł a la capa fĂsica valida la viabilitat de l'enfoc usant tecnologies longituds d'ona milimĂštriques en un futur proper, i suggereix l'Ășs d'antenes de grafĂš a la banda dels terahertz ja a mĂ©s llarg termini. A capa d'enllaç, la tesi aporta una anĂ lisi del context de l'aplicaciĂł que Ă©s, mĂ©s tard, utilitzada per al disseny d'un protocol d'accĂ©s al medi que permet servir trĂ fic broadcast a baixa latĂšncia i de forma fiable. A capa de xarxa, la nostra visiĂł hĂbrida Ă©s avaluada posant Ăšmfasi en la flexibilitat que aporta el fet de prendre les decisions a nivell de la interfĂcie de xarxa, mostrant grans millores de rendiment per una Ă mplia selecciĂł de patrons de trĂ fic. A nivell d'arquitectura, l'impacte que el concepte de BoWNoC pot tenir sobre el disseny de processadors amb molts nuclis no nomĂ©s Ă©s debatut de forma qualitativa i genĂšrica, sinĂł tambĂ© avaluat quantitativament per una arquitectura concreta enfocada a la sincronitzaciĂł. Els resultats demostren que l'impacte de BoWNoC pot anar mĂ©s enllĂ d'una millora en termes de rendiment de xarxa; representant, possiblement, un canvi radical a l'era dels molts nuclisAward-winningPostprint (published version
- âŠ