68 research outputs found

    VThreads: A novel VLIW chip multiprocessor with hardware-assisted PThreads

    Get PDF
    We discuss VThreads, a novel VLIW CMP with hardware-assisted shared-memory Thread support. VThreads supports Instruction Level Parallelism via static multiple-issue and Thread Level Parallelism via hardware-assisted POSIX Threads along with extensive customization. It allows the instantiation of tightlycoupled streaming accelerators and supports up to 7-address Multiple-Input, Multiple-Output instruction extensions. VThreads is designed in technology-independent Register-Transfer-Level VHDL and prototyped on 40 nm and 28 nm Field-Programmable gate arrays. It was evaluated against a PThreads-based multiprocessor based on the Sparc-V8 ISA. On a 65 nm ASIC implementation VThreads achieves up to x7.2 performance increase on synthetic benchmarks, x5 on a parallel Mandelbrot implementation, 66% better on a threaded JPEG implementation, 79% better on an edge-detection benchmark and ~13% improvement on DES compared to the Leon3MP CMP. In the range of 2 to 8 cores VThreads demonstrates a post-route (statistical) power reduction between 65% to 57% at an area increase of 1.2%-10% for 1-8 cores, compared to a similarly-configured Leon3MP CMP. This combination of micro-architectural features, scalability, extensibility, hardware support for low-latency PThreads, power efficiency and area make the processor an attractive proposition for low-power, deeply-embedded applications requiring minimum OS support

    Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

    Get PDF
    The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout

    The Maunakea Spectroscopic Explorer Book 2018

    Full text link
    (Abridged) This is the Maunakea Spectroscopic Explorer 2018 book. It is intended as a concise reference guide to all aspects of the scientific and technical design of MSE, for the international astronomy and engineering communities, and related agencies. The current version is a status report of MSE's science goals and their practical implementation, following the System Conceptual Design Review, held in January 2018. MSE is a planned 10-m class, wide-field, optical and near-infrared facility, designed to enable transformative science, while filling a critical missing gap in the emerging international network of large-scale astronomical facilities. MSE is completely dedicated to multi-object spectroscopy of samples of between thousands and millions of astrophysical objects. It will lead the world in this arena, due to its unique design capabilities: it will boast a large (11.25 m) aperture and wide (1.52 sq. degree) field of view; it will have the capabilities to observe at a wide range of spectral resolutions, from R2500 to R40,000, with massive multiplexing (4332 spectra per exposure, with all spectral resolutions available at all times), and an on-target observing efficiency of more than 80%. MSE will unveil the composition and dynamics of the faint Universe and is designed to excel at precision studies of faint astrophysical phenomena. It will also provide critical follow-up for multi-wavelength imaging surveys, such as those of the Large Synoptic Survey Telescope, Gaia, Euclid, the Wide Field Infrared Survey Telescope, the Square Kilometre Array, and the Next Generation Very Large Array.Comment: 5 chapters, 160 pages, 107 figure

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Precise Point Positioning Augmentation for Various Grades of Global Navigation Satellite System Hardware

    Get PDF
    The next generation of low-cost, dual-frequency, multi-constellation GNSS receivers, boards, chips and antennas are now quickly entering the market, offering to disrupt portions of the precise GNSS positioning industry with much lower cost hardware and promising to provide precise positioning to a wide range of consumers. The presented work provides a timely, novel and thorough investigation into the positioning performance promise. A systematic and rigorous set of experiments has been carried-out, collecting measurements from a wide array of low-cost, dual-frequency, multi-constellation GNSS boards, chips and antennas introduced in late 2018 and early 2019. These sensors range from dual-frequency, multi-constellation chips in smartphones to stand-alone chips and boards. In order to be comprehensive and realistic, these experiments were conducted in a number of static and kinematic benign, typical, suburban and urban environments. In terms of processing raw measurements from these sensors, the Precise Point Positioning (PPP) GNSS measurement processing mode was used. PPP has become the defacto GNSS positioning and navigation technique for scientific and engineering applications that require dm- to cm-level positioning in remote areas with few obstructions and provides for very efficient worldwide, wide-array augmentation corrections. To enhance solution accuracy, novel contributions were made through atmospheric constraints and the use of dual- and triple-frequency measurements to significantly reduce PPP convergence period. Applying PPP correction augmentations to smartphones and recently released low-cost equipment, novel analyses were made with significantly improved solution accuracy. Significant customization to the York-PPP GNSS measurement processing engine was necessary, especially in the quality control and residual analysis functions, in order to successfully process these datasets. Results for new smartphone sensors show positioning performance is typically at the few dm-level with a convergence period of approximately 40 minutes, which is 1 to 2 orders of magnitude better than standard point positioning. The GNSS chips and boards combined with higher-quality antennas produce positioning performance approaching geodetic quality. Under ideal conditions, carrier-phase ambiguities are resolvable. The results presented show a novel perspective and are very promising for the use of PPP (as well as RTK) in next-generation GNSS sensors for various application in smartphones, autonomous vehicles, Internet of things (IoT), etc

    Contribution au domaine de la conception d’objets communicants embarqués basse consommation et autonomes en énergie

    Get PDF
    This report proposes a synthesis of my research and teaching activities. Since 2008, as associate professor at the University of Nice Sophia Antipolis, I did my research into the MCSOC team from the LEAT laboratory. For nearly 15 years, my activity is focused on the design of embedded communicating objects, with a strong emphasis for high level approach allowing, early in the design flow, to model and optimize the performance as well as the consumed energy. Those system-level approaches are more and more relevant over the last few years and become a must-have solution for designing efficient embedded systems. My activity on energy harvesting for autonomous systems brings an original contribution to this domain and has a national and international impact. This document is organized in two parts: the first part is a synthesis of my research and teaching activity, while the second one presents in details my research work, putting in evidence my contributions and innovative aspects. The manuscript ends with a scientific overview as well as some perspectives.Ce manuscrit présente une synthèse de mes travaux de recherche. Depuis septembre 2008, date de ma nomination en tant que Maître de Conférences à l’Université de Nice Sophia Antipolis, j’ai effectué mes travaux de recherche au sein de la thématique MCSOC (Modélisation, Conception Système d’Objets Communicants) du laboratoire LEAT (Université de Nice Sophia Antipolis, UMR CNRS 7248). Depuis maintenant près de 15 ans, mes travaux de recherche s’intéressent au domaine de la conception d’objets communicants embarqués avec une évolution forte vers des approches de haut niveau d’abstraction permettant tôt dans le flot de conception, de modéliser et d’optimiser les performances et la consommation d’énergie. Ces approches de niveau système n’ont cessé de prendre de l’ampleur ces dernières années et s’installent aujourd’hui comme une solution incontournable du domaine de la conception de systèmes embarqués. Mes travaux plus spécifiques sur l’autonomie énergétique de ces systèmes apportent une contribution originale au domaine et ont un rayonnement national et international. Ce document est organisé en deux parties : la première partie propose une synthèse des travaux de recherche et d’enseignement ; la seconde présente de manière détaillée mes travaux de recherche en mettant en avant toutes ses contributions et originalités. Le manuscrit s’achève par un bilan scientifique ainsi que quelques perspectives de recherche

    CROSS-LAYER DESIGN, OPTIMIZATION AND PROTOTYPING OF NoCs FOR THE NEXT GENERATION OF HOMOGENEOUS MANY-CORE SYSTEMS

    Get PDF
    This thesis provides a whole set of design methods to enable and manage the runtime heterogeneity of features-rich industry-ready Tile-Based Networkon- Chips at different abstraction layers (Architecture Design, Network Assembling, Testing of NoC, Runtime Operation). The key idea is to maintain the functionalities of the original layers, and to improve the performance of architectures by allowing, joint optimization and layer coordinations. In general purpose systems, we address the microarchitectural challenges by codesigning and co-optimizing feature-rich architectures. In application-specific NoCs, we emphasize the event notification, so that the platform is continuously under control. At the network assembly level, this thesis proposes a Hold Time Robustness technique, to tackle the hold time issue in synchronous NoCs. At the network architectural level, the choice of a suitable synchronization paradigm requires a boost of synthesis flow as well as the coexistence with the DVFS. On one hand this implies the coexistence of mesochronous synchronizers in the network with dual-clock FIFOs at network boundaries. On the other hand, dual-clock FIFOs may be placed across inter-switch links hence removing the need for mesochronous synchronizers. This thesis will study the implications of the above approaches both on the design flow and on the performance and power quality metrics of the network. Once the manycore system is composed together, the issue of testing it arises. This thesis takes on this challenge and engineers various testing infrastructures. At the upper abstraction layer, the thesis addresses the issue of managing the fully operational system and proposes a congestion management technique named HACS. Moreover, some of the ideas of this thesis will undergo an FPGA prototyping. Finally, we provide some features for emerging technology by characterizing the power consumption of Optical NoC Interfaces
    corecore