10 research outputs found

    High-Level Modelling of Optical Integrated Networks-Based Systems with the Provision of a Low Latency Controller

    Get PDF
    RÉSUMÉ La tendance du marché dans la conception des architectures multiprocesseurs de la prochaine génération consiste à intégrer de plus en plus de cœurs dans la même puce. Cette concentra-tion des cœurs dans la même puce exige l’amélioration des politiques d’intercommunication. L’une des solutions proposées dans ce contexte consiste à utiliser les réseaux sur puce vu qu’ils présentent une amélioration considérable en termes de la bande passante, l’évolutivité et de l’extensibilité. Néanmoins, vu la croissance exponentielle en nombres de cœurs sur puce, les interconnexions électriques dans les réseaux sur puce peuvent devenir un goulet d’étranglement dans la performance du système. Par conséquent, des nouvelles techniques et technologies doivent être adoptées pour remédier à ces problèmes. Les réseaux optiques intégrés (OIN venant de l’anglais Optical Integrated Networks) sont actuellement considérés comme l’un des paradigmes les plus prometteurs dans ce contexte. Les OINs o˙rent une plus grande bande passante, une plus faible consommation d’énergie et moins de latence lors de l’échange des données. Plusieurs travaux récents démontrent la faisabilité des OIN avec les technologies de fabrication disponibles et compatibles avec CMOS. Cependant, les concepteurs des OINs font face à plusieurs défis : Actuellement, les contrôleurs représentent le principal goulot d’étranglement de la com-munication et présentent l’un des facteurs minimisant l’eÿcacité des OINs. Alors, la proposition des nouvelles solutions de contrôle à faible latence est de plus en plus pri-mordiale pour en tirer profit. Le manque d’outils de modélisation et de validation des OINs. La plupart des travaux se concentrent sur la conception des dispositifs et l’amélioration des performances des composants de base, tout en laissant le système sans assistance. Dans ce contexte, afin de faciliter le déploiement de systèmes basés sur les OINs, cette thèse se focalise sur les trois contributions majeures suivantes: (1) le développement d’un ensemble de méthodes précises de modélisation qui va permettre par la suite de réaliser une plateforme de simulation au niveau du système ; (2) la définition et le développement d’une approche de contrôle eÿcace pour les systèmes basés sur les OINs; (3) l’évaluation de l’approche de contrôle proposée.----------ABSTRACT Design trends for next-generation Multi-Processor Systems point to the integration of a large number of processing cores, requiring high-performance interconnects. One solution being applied to improve the communication infrastructure in such systems is the usage of Networks-on-Chip as they present considerable improvement in the bandwidth and scaleabil-ity. Still as the number of integrated cores continues to increase and the system scales, the metallic interconnects in Networks-on-Chip can become a performance bottleneck. As a result, a new strategy must be adopted in order for those issues to be remedied. Optical Integrated Networks (OINs) are currently considered to be one of the most promising paradigm in this design context: they present higher bandwidth, lower power consumption and lower latency to broadcast information. Also, the latest work demonstrates the feasibility of OINs with their fabrication technologies being available and CMOS compatible. However, OINs’ designers face several challenges: Currently, controllers represent the main communication bottleneck and are one of the factors limiting the usage of OINs. Therefore, new controlling solutions with low latency are required. Designers lack tools to model and validate OINs. Most research nowadays is focused on designing devices and improving basic components performance, leaving system unattended. In this context, in order to ease the deployment of OIN-based systems, this PhD project focuses on three main contributions: (1) the development of accurate system-level modelling study to realize a system-level simulation platform; (2) the definition and development of an eÿcient control approach for OIN-based systems, and; (3) the system-level evaluation of the proposed control approach using the defined modelling

    Towards Terabit Carrier Ethernet and Energy Efficient Optical Transport Networks

    Get PDF

    A Dynamically Reconfigurable Parallel Processing Framework with Application to High-Performance Video Processing

    Get PDF
    Digital video processing demands have and will continue to grow at unprecedented rates. Growth comes from ever increasing volume of data, demand for higher resolution, higher frame rates, and the need for high capacity communications. Moreover, economic realities force continued reductions in size, weight and power requirements. The ever-changing needs and complexities associated with effective video processing systems leads to the consideration of dynamically reconfigurable systems. The goal of this dissertation research was to develop and demonstrate the viability of integrated parallel processing system that effectively and efficiently apply pre-optimized hardware cores for processing video streamed data. Digital video is decomposed into packets which are then distributed over a group of parallel video processing cores. Real time processing requires an effective task scheduler that distributes video packets efficiently to any of the reconfigurable distributed processing nodes across the framework, with the nodes running on FPGA reconfigurable logic in an inherently Virtual\u27 mode. The developed framework, coupled with the use of hardware techniques for dynamic processing optimization achieves an optimal cost/power/performance realization for video processing applications. The system is evaluated by testing processor utilization relative to I/O bandwidth and algorithm latency using a separable 2-D FIR filtering system, and a dynamic pixel processor. For these applications, the system can achieve performance of hundreds of 640x480 video frames per second across an eight lane Gen I PCIe bus. Overall, optimal performance is achieved in the sense that video data is processed at the maximum possible rate that can be streamed through the processing cores. This performance, coupled with inherent ability to dynamically add new algorithms to the described dynamically reconfigurable distributed processing framework, creates new opportunities for realizable and economic hardware virtualization.\u2

    The effect of an optical network on-chip on the performance of chip multiprocessors

    Get PDF
    Optical networks on-chip (ONoC) have been proposed to reduce power consumption and increase bandwidth density in high performance chip multiprocessors (CMP), compared to electrical NoCs. However, as buffering in an ONoC is not viable, the end-to-end message path needs to be acquired in advance during which the message is buffered at the network ingress. This waiting latency is therefore a combination of path setup latency and contention and forms a significant part of the total message latency. Many proposed ONoCs, such as Single Writer, Multiple Reader (SWMR), avoid path setup latency at the expense of increased optical components. In contrast, this thesis investigates a simple circuit-switched ONoC with lower component count where nodes need to request a channel before transmission. To hide the path setup latency, a coherence-based message predictor is proposed, to setup circuits before message arrival. Firstly, the effect of latency and bandwidth on application performance is thoroughly investigated using full-system simulations of shared memory CMPs. It is shown that the latency of an ideal NoC affects the CMP performance more than the NoC bandwidth. Increasing the number of wavelengths per channel decreases the serialisation latency and improves the performance of both ONoC types. With 2 or more wavelengths modulating at 25 Gbit=s , the ONoCs will outperform a conventional electrical mesh (maximal speedup of 20%). The SWMR ONoC outperforms the circuit-switched ONoC. Next coherence-based prediction techniques are proposed to reduce the waiting latency. The ideal coherence-based predictor reduces the waiting latency by 42%. A more streamlined predictor (smaller than a L1 cache) reduces the waiting latency by 31%. Without prediction, the message latency in the circuit-switched ONoC is 11% larger than in the SWMR ONoC. Applying the realistic predictor reverses this: the message latency in the SWMR ONoC is now 18% larger than the predictive circuitswitched ONoC

    マルチレベル並列化とアプリケーション指向データレイアウトを用いるハードウェアアクセラレータの設計と実装

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 稲葉 雅幸, 東京大学教授 須田 礼仁, 東京大学教授 五十嵐 健夫, 東京大学教授 山西 健司, 東京大学准教授 稲葉 真理, 東京大学講師 中山 英樹University of Tokyo(東京大学

    Dynamic reconfiguration frameworks for high-performance reliable real-time reconfigurable computing

    Get PDF
    The sheer hardware-based computational performance and programming flexibility offered by reconfigurable hardware like Field-Programmable Gate Arrays (FPGAs) make them attractive for computing in applications that require high performance, availability, reliability, real-time processing, and high efficiency. Fueled by fabrication process scaling, modern reconfigurable devices come with ever greater quantities of on-chip resources, allowing a more complex variety of applications to be developed. Thus, the trend is that technology giants like Microsoft, Amazon, and Baidu now embrace reconfigurable computing devices likes FPGAs to meet their critical computing needs. In addition, the capability to autonomously reprogramme these devices in the field is being exploited for reliability in application domains like aerospace, defence, military, and nuclear power stations. In such applications, real-time computing is important and is often a necessity for reliability. As such, applications and algorithms resident on these devices must be implemented with sufficient considerations for real-time processing and reliability. Often, to manage a reconfigurable hardware device as a computing platform for a multiplicity of homogenous and heterogeneous tasks, reconfigurable operating systems (ROSes) have been proposed to give a software look to hardware-based computation. The key requirements of a ROS include partitioning, task scheduling and allocation, task configuration or loading, and inter-task communication and synchronization. Existing ROSes have met these requirements to varied extents. However, they are limited in reliability, especially regarding the flexibility of placing the hardware circuits of tasks on device’s chip area, the problem arising more from the partitioning approaches used. Indeed, this problem is deeply rooted in the static nature of the on-chip inter-communication among tasks, hampering the flexibility of runtime task relocation for reliability. This thesis proposes the enabling frameworks for reliable, available, real-time, efficient, secure, and high-performance reconfigurable computing by providing techniques and mechanisms for reliable runtime reconfiguration, and dynamic inter-circuit communication and synchronization for circuits on reconfigurable hardware. This work provides task configuration infrastructures for reliable reconfigurable computing. Key features, especially reliability-enabling functionalities, which have been given little or no attention in state-of-the-art are implemented. These features include internal register read and write for device diagnosis; configuration operation abort mechanism, and tightly integrated selective-area scanning, which aims to optimize access to the device’s reconfiguration port for both task loading and error mitigation. In addition, this thesis proposes a novel reliability-aware inter-task communication framework that exploits the availability of dedicated clocking infrastructures in a typical FPGA to provide inter-task communication and synchronization. The clock buffers and networks of an FPGA use dedicated routing resources, which are distinct from the general routing resources. As such, deploying these dedicated resources for communication sidesteps the restriction of static routes and allows a better relocation of circuits for reliability purposes. For evaluation, a case study that uses a NASA/JPL spectrometer data processing application is employed to demonstrate the improved reliability brought about by the implemented configuration controller and the reliability-aware dynamic communication infrastructure. It is observed that up to 74% time saving can be achieved for selective-area error mitigation when compared to state-of-the-art vendor implementations. Moreover, an improvement in overall system reliability is observed when the proposed dynamic communication scheme is deployed in the data processing application. Finally, one area of reconfigurable computing that has received insufficient attention is security. Meanwhile, considering the nature of applications which now turn to reconfigurable computing for accelerating compute-intensive processes, a high premium is now placed on security, not only of the device but also of the applications, from loading to runtime execution. To address security concerns, a novel secure and efficient task configuration technique for task relocation is also investigated, providing configuration time savings of up to 32% or 83%, depending on the device; and resource usage savings in excess of 90% compared to state-of-the-art

    Mu2e Technical Design Report

    Full text link
    The Mu2e experiment at Fermilab will search for charged lepton flavor violation via the coherent conversion process mu- N --> e- N with a sensitivity approximately four orders of magnitude better than the current world's best limits for this process. The experiment's sensitivity offers discovery potential over a wide array of new physics models and probes mass scales well beyond the reach of the LHC. We describe herein the preliminary design of the proposed Mu2e experiment. This document was created in partial fulfillment of the requirements necessary to obtain DOE CD-2 approval.Comment: compressed file, 888 pages, 621 figures, 126 tables; full resolution available at http://mu2e.fnal.gov; corrected typo in background summary, Table 3.

    Online scheduling for real-time multitasking on reconfigurable hardware devices

    Get PDF
    Nowadays the ever increasing algorithmic complexity of embedded applications requires the designers to turn towards heterogeneous and highly integrated systems denoted as SoC (System-on-a-Chip). These architectures may embed CPU-based processors, dedicated datapaths as well as recon gurable units. However, embedded SoCs are submitted to stringent requirements in terms of speed, size, cost, power consumption, throughput, etc. Therefore, new computing paradigms are required to ful l the constraints of the applications and the requirements of the architecture. Recon gurable Computing is a promising paradigm that provides probably the best trade-o between these requirements and constraints. Dynamically recon gurable architectures are their key enabling technology. They enable the hardware to adapt to the application at runtime. However, these architectures raise new challenges in SoC design. For example, on one hand, designing a system that takes advantage of dynamic recon guration is still very time consuming because of the lack of design methodologies and tools. On the other hand, scheduling hardware tasks di ers from classical software tasks scheduling on microprocessor or multiprocessors systems, as it bears a further complicated placement problem. This thesis deals with the problem of scheduling online real-time hardware tasks on Dynamically Recon gurable Hardware Devices (DRHWs). The problem is addressed from two angles : (i) Investigating novel algorithms for online real-time scheduling/placement on DRHWs. (ii) Scheduling/Placement algorithms library for RTOS-driven Design Space Exploration (DSE). Regarding the first point, the thesis proposes two main runtime-aware scheduling and placement techniques and assesses their suitability for online real-time scenarios. The first technique discusses the impact of synthesizing, at design time, several shapes and/or sizes per hardware task (denoted as multi-shape task), in order to ease the online scheduling process. The second technique combines a looking-ahead scheduling approach with a slots-based recon gurable areas management that relies on a 1D placement. The results show that in both techniques, the scheduling and placement quality is improved without signi cantly increasing the algorithm time complexity. Regarding the second point, in the process of designing SoCs embedding recon gurable parts, new design paradigms tend to explore and validate as early as possible, at system level, the architectural design space. Therefore, the RTOS (Real-Time Operating System) services that manage the recon gurable parts of the SoC can be re fined. In such a context, gathering numerous hardware tasks scheduling and placement algorithms of various complexity vs performance trade-o s in a kind of library is required. In this thesis, proposed algorithms in addition to some existing ones are purposely implemented in C++ language, in order to insure the compatibility with any C++/SystemC based SoC design methodology.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore