89 research outputs found

    High-Level Modelling of Optical Integrated Networks-Based Systems with the Provision of a Low Latency Controller

    Get PDF
    RÉSUMÉ La tendance du marché dans la conception des architectures multiprocesseurs de la prochaine génération consiste à intégrer de plus en plus de cœurs dans la même puce. Cette concentra-tion des cœurs dans la même puce exige l’amélioration des politiques d’intercommunication. L’une des solutions proposées dans ce contexte consiste à utiliser les réseaux sur puce vu qu’ils présentent une amélioration considérable en termes de la bande passante, l’évolutivité et de l’extensibilité. Néanmoins, vu la croissance exponentielle en nombres de cœurs sur puce, les interconnexions électriques dans les réseaux sur puce peuvent devenir un goulet d’étranglement dans la performance du système. Par conséquent, des nouvelles techniques et technologies doivent être adoptées pour remédier à ces problèmes. Les réseaux optiques intégrés (OIN venant de l’anglais Optical Integrated Networks) sont actuellement considérés comme l’un des paradigmes les plus prometteurs dans ce contexte. Les OINs o˙rent une plus grande bande passante, une plus faible consommation d’énergie et moins de latence lors de l’échange des données. Plusieurs travaux récents démontrent la faisabilité des OIN avec les technologies de fabrication disponibles et compatibles avec CMOS. Cependant, les concepteurs des OINs font face à plusieurs défis : Actuellement, les contrôleurs représentent le principal goulot d’étranglement de la com-munication et présentent l’un des facteurs minimisant l’eÿcacité des OINs. Alors, la proposition des nouvelles solutions de contrôle à faible latence est de plus en plus pri-mordiale pour en tirer profit. Le manque d’outils de modélisation et de validation des OINs. La plupart des travaux se concentrent sur la conception des dispositifs et l’amélioration des performances des composants de base, tout en laissant le système sans assistance. Dans ce contexte, afin de faciliter le déploiement de systèmes basés sur les OINs, cette thèse se focalise sur les trois contributions majeures suivantes: (1) le développement d’un ensemble de méthodes précises de modélisation qui va permettre par la suite de réaliser une plateforme de simulation au niveau du système ; (2) la définition et le développement d’une approche de contrôle eÿcace pour les systèmes basés sur les OINs; (3) l’évaluation de l’approche de contrôle proposée.----------ABSTRACT Design trends for next-generation Multi-Processor Systems point to the integration of a large number of processing cores, requiring high-performance interconnects. One solution being applied to improve the communication infrastructure in such systems is the usage of Networks-on-Chip as they present considerable improvement in the bandwidth and scaleabil-ity. Still as the number of integrated cores continues to increase and the system scales, the metallic interconnects in Networks-on-Chip can become a performance bottleneck. As a result, a new strategy must be adopted in order for those issues to be remedied. Optical Integrated Networks (OINs) are currently considered to be one of the most promising paradigm in this design context: they present higher bandwidth, lower power consumption and lower latency to broadcast information. Also, the latest work demonstrates the feasibility of OINs with their fabrication technologies being available and CMOS compatible. However, OINs’ designers face several challenges: Currently, controllers represent the main communication bottleneck and are one of the factors limiting the usage of OINs. Therefore, new controlling solutions with low latency are required. Designers lack tools to model and validate OINs. Most research nowadays is focused on designing devices and improving basic components performance, leaving system unattended. In this context, in order to ease the deployment of OIN-based systems, this PhD project focuses on three main contributions: (1) the development of accurate system-level modelling study to realize a system-level simulation platform; (2) the definition and development of an eÿcient control approach for OIN-based systems, and; (3) the system-level evaluation of the proposed control approach using the defined modelling

    CROSS-LAYER DESIGN, OPTIMIZATION AND PROTOTYPING OF NoCs FOR THE NEXT GENERATION OF HOMOGENEOUS MANY-CORE SYSTEMS

    Get PDF
    This thesis provides a whole set of design methods to enable and manage the runtime heterogeneity of features-rich industry-ready Tile-Based Networkon- Chips at different abstraction layers (Architecture Design, Network Assembling, Testing of NoC, Runtime Operation). The key idea is to maintain the functionalities of the original layers, and to improve the performance of architectures by allowing, joint optimization and layer coordinations. In general purpose systems, we address the microarchitectural challenges by codesigning and co-optimizing feature-rich architectures. In application-specific NoCs, we emphasize the event notification, so that the platform is continuously under control. At the network assembly level, this thesis proposes a Hold Time Robustness technique, to tackle the hold time issue in synchronous NoCs. At the network architectural level, the choice of a suitable synchronization paradigm requires a boost of synthesis flow as well as the coexistence with the DVFS. On one hand this implies the coexistence of mesochronous synchronizers in the network with dual-clock FIFOs at network boundaries. On the other hand, dual-clock FIFOs may be placed across inter-switch links hence removing the need for mesochronous synchronizers. This thesis will study the implications of the above approaches both on the design flow and on the performance and power quality metrics of the network. Once the manycore system is composed together, the issue of testing it arises. This thesis takes on this challenge and engineers various testing infrastructures. At the upper abstraction layer, the thesis addresses the issue of managing the fully operational system and proposes a congestion management technique named HACS. Moreover, some of the ideas of this thesis will undergo an FPGA prototyping. Finally, we provide some features for emerging technology by characterizing the power consumption of Optical NoC Interfaces

    Cloudy in guifi.net: Establishing and sustaining a community cloud as open commons

    Get PDF
    Commons are natural or human-made resources that are managed cooperatively. The guifi.net community network is a successful example of a digital infrastructure, a computer network, managed as an open commons. Inspired by the guifi.net case and its commons governance model, we claim that a computing cloud, another digital infrastructure, can also be managed as an open commons if the appropriate tools are put in place. In this paper, we explore the feasibility and sustainability of community clouds as open commons: open user-driven clouds formed by community-managed computing resources. We propose organising the infrastructure as a service (IaaS) and platform as a service (PaaS) cloud service layers as common-pool resources (CPR) for enabling a sustainable cloud service provision. On this basis, we have outlined a governance framework for community clouds, and we have developed Cloudy, a cloud software stack that comprises a set of tools and components to build and operate community cloud services. Cloudy is tailored to the needs of the guifi.net community network, but it can be adopted by other communities. We have validated the feasibility of community clouds in a deployment in guifi.net of some 60 devices running Cloudy for over two years. To gain insight into the capacity of end-user services to generate enough value and utility to sustain the whole cloud ecosystem, we have developed a file storage application and tested it with a group of 10 guifi.net users. The experimental results and the experience from the action research confirm the feasibility and potential sustainability of the community cloud as an open commons.Peer ReviewedPostprint (author's final draft

    On the simulation and design of manycore CMPs

    Get PDF
    The progression of Moore’s Law has resulted in both embedded and performance computing systems which use an ever increasing number of processing cores integrated in a single chip. Commercial systems are now available which provide hundreds of cores, and academics have proposed architectures for up to 1024 cores. Embedded multicores are increasingly popular as it is easier to guarantee hard-realtime constraints using individual cores dedicated for tasks, than to use traditional time-multiplexed processing. However, finding the optimal hardware configuration to meet these requirements at minimum cost requires extensive trial and error approaches to investigate the design space. This thesis tackles the problems encountered in the design of these large scale multicore systems by first addressing the problem of fast, detailed micro-architectural simulation. Initially addressing embedded systems, this work exploits the lack of hardware cache-coherence support in many deeply embedded systems to increase the available parallelism in the simulation. Then, through partitioning the NoC and using packet counting and cycle skipping reduces the amount of computation required to accurately model the NoC interconnect. In combination, this enables simulation speeds significantly higher than the state of the art, while maintaining less error, when compared to real hardware, than any similar simulator. Simulation speeds reach up to 370MIPS (Million (target) Instructions Per Second), or 110MHz, which is better than typical FPGA prototypes, and approaching final ASIC production speeds. This is achieved while maintaining an error of only 2.1%, significantly lower than other similar simulators. The thesis continues by scaling the simulator past large embedded systems up to 64-1024 core processors, adding support for coherent architectures using the same packet counting techniques along with low overhead context switching to enable the simulation of such large systems with stricter synchronisation requirements. The new interconnect model was partitioned to enable parallel simulation to further improve simulation speeds in a manner which did not sacrifice any accuracy. These innovations were leveraged to investigate significant novel energy saving optimisations to the coherency protocol, processor ISA, and processor micro-architecture. By introducing a new instruction, with the name wait-on-address, the energy spent during spin-wait style synchronisation events can be significantly reduced. This functions by putting the core into a low-power idle state while the cache line of the indicated address is monitored for coherency action. Upon an update or invalidation (or traditional timer or external interrupts) the core will resume execution, but the active energy of running the core pipeline and repeatedly accessing the data and instruction caches is effectively reduced to static idle power. The thesis also shows that existing combined software-hardware schemes to track data regions which do not require coherency can adequately address the directory-associativity problem, and introduces a new coherency sharer encoding which reduces the energy consumed by sharer invalidations when sharers are grouped closely together, such as would be the case with a system running many tasks with a small degree of parallelism in each. The research concludes by using the extremely fast simulation speeds developed to produce a large set of training data, collecting various runtime and energy statistics for a wide range of embedded applications on a huge diverse range of potential MPSoC designs. This data was used to train a series of machine learning based models which were then evaluated on their capacity to predict performance characteristics of unseen workload combinations across the explored MPSoC design space, using only two sample simulations, with promising results from some of the machine learning techniques. The models were then used to produce a ranking of predicted performance across the design space, and on average Random Forest was able to predict the best design within 89% of the runtime performance of the actual best tested design, and better than 93% of the alternative design space. When predicting for a weighted metric of energy, delay and area, Random Forest on average produced results within 93% of the optimum result. In summary this thesis improves upon the state of the art for cycle accurate multicore simulation, introduces novel energy saving changes the the ISA and microarchitecture of future multicore processors, and demonstrates the viability of machine learning techniques to significantly accelerate the design space exploration required to bring a new manycore design to market

    Behavior Flexibility for Autonomous Unmanned Aerial Systems

    Get PDF
    Autonomous unmanned aerial systems (UAS) could supplement and eventually subsume a substantial portion of the mission set currently executed by remote pilots, making UAS more robust, responsive, and numerous than permitted by teleoperation alone. Unfortunately, the development of robust autonomous systems is difficult, costly, and time-consuming. Furthermore, the resulting systems often make little reuse of proven software components and offer limited adaptability for new tasks. This work presents a development platform for UAS which promotes behavioral flexibility. The platform incorporates the Unified Behavior Framework (a modular, extensible autonomy framework), the Robotic Operating System (a RSF), and PX4 (an open- source flight controller). Simulation of UBF agents identify a combination of reactive robotic control strategies effective for small-scale navigation tasks by a UAS in the presence of obstacles. Finally, flight tests provide a partial validation of the simulated results. The development platform presented in this work offers robust and responsive behavioral flexibility for UAS agents in simulation and reality. This work lays the foundation for further development of a unified autonomous UAS platform supporting advanced planning algorithms and inter-agent communication by providing a behavior-flexible framework in which to implement, execute, extend, and reuse behaviors

    On packet switch design

    Get PDF

    Modelling and characterisation of distributed hardware acceleration

    Get PDF
    Hardware acceleration has become more commonly utilised in networked computing systems. The growing complexity of applications mean that traditional CPU architectures can no longer meet stringent latency constraints. Alternative computing architectures such as GPUs and FPGAs are increasingly available, along with simpler, more software-like development flows. The work presented in this thesis characterises the overheads associated with these accelerator architectures. A holistic view encompassing both computation and communication latency must be considered. Experimental results obtained through this work show that networkattached accelerators scale better than server-hosted deployments, and that host ingestion overheads are comparable to network traversal times in some cases. Along with the choice of processing platforms, it is becoming more important to consider how workloads are partitioned and where in the network tasks are being performed. Manual allocation and evaluation of tasks to network nodes does not scale with network and workload complexity. A mathematical formulation of this problem is presented within this thesis that takes into account all relevant performance metrics. Unlike other works, this model takes into account growing hardware heterogeneity and workload complexity, and is generalisable to a range of scenarios. This model can be used in an optimisation that generates lower cost results with latency performance close to theoretical maximums compared to naive placement approaches. With the mathematical formulation and experimental results that characterise hardware accelerator overheads, the work presented in this thesis can be used to make informed design decisions about both where to allocate tasks and deploy accelerators in the network, and the associated costs

    Rapport annuel 2016

    Get PDF

    SYSTEM-ON-A-CHIP (SOC)-BASED HARDWARE ACCELERATION FOR HUMAN ACTION RECOGNITION WITH CORE COMPONENTS

    Get PDF
    Today, the implementation of machine vision algorithms on embedded platforms or in portable systems is growing rapidly due to the demand for machine vision in daily human life. Among the applications of machine vision, human action and activity recognition has become an active research area, and market demand for providing integrated smart security systems is growing rapidly. Among the available approaches, embedded vision is in the top tier; however, current embedded platforms may not be able to fully exploit the potential performance of machine vision algorithms, especially in terms of low power consumption. Complex algorithms can impose immense computation and communication demands, especially action recognition algorithms, which require various stages of preprocessing, processing and machine learning blocks that need to operate concurrently. The market demands embedded platforms that operate with a power consumption of only a few watts. Attempts have been mad to improve the performance of traditional embedded approaches by adding more powerful processors; this solution may solve the computation problem but increases the power consumption. System-on-a-chip eld-programmable gate arrays (SoC-FPGAs) have emerged as a major architecture approach for improving power eciency while increasing computational performance. In a SoC-FPGA, an embedded processor and an FPGA serving as an accelerator are fabricated in the same die to simultaneously improve power consumption and performance. Still, current SoC-FPGA-based vision implementations either shy away from supporting complex and adaptive vision algorithms or operate at very limited resolutions due to the immense communication and computation demands. The aim of this research is to develop a SoC-based hardware acceleration workflow for the realization of advanced vision algorithms. Hardware acceleration can improve performance for highly complex mathematical calculations or repeated functions. The performance of a SoC system can thus be improved by using hardware acceleration method to accelerate the element that incurs the highest performance overhead. The outcome of this research could be used for the implementation of various vision algorithms, such as face recognition, object detection or object tracking, on embedded platforms. The contributions of SoC-based hardware acceleration for hardware-software codesign platforms include the following: (1) development of frameworks for complex human action recognition in both 2D and 3D; (2) realization of a framework with four main implemented IPs, namely, foreground and background subtraction (foreground probability), human detection, 2D/3D point-of-interest detection and feature extraction, and OS-ELM as a machine learning algorithm for action identication; (3) use of an FPGA-based hardware acceleration method to resolve system bottlenecks and improve system performance; and (4) measurement and analysis of system specications, such as the acceleration factor, power consumption, and resource utilization. Experimental results show that the proposed SoC-based hardware acceleration approach provides better performance in terms of the acceleration factor, resource utilization and power consumption among all recent works. In addition, a comparison of the accuracy of the framework that runs on the proposed embedded platform (SoCFPGA) with the accuracy of other PC-based frameworks shows that the proposed approach outperforms most other approaches
    • …
    corecore