18,798 research outputs found

    High-Level Synthesis Hardware Design for FPGA-Based Accelerators: Models, Methodologies, and Frameworks

    Get PDF
    Hardware accelerators based on field programmable gate array (FPGA) and system on chip (SoC) devices have gained attention in recent years. One of the main reasons is that these devices contain reconfigurable logic, which makes them feasible for boosting the performance of applications. High-level synthesis (HLS) tools facilitate the creation of FPGA code from a high level of abstraction using different directives to obtain an optimized hardware design based on performance metrics. However, the complexity of the design space depends on different factors such as the number of directives used in the source code, the available resources in the device, and the clock frequency. Design space exploration (DSE) techniques comprise the evaluation of multiple implementations with different combinations of directives to obtain a design with a good compromise between different metrics. This paper presents a survey of models, methodologies, and frameworks proposed for metric estimation, FPGA-based DSE, and power consumption estimation on FPGA/SoC. The main features, limitations, and trade-offs of these approaches are described. We also present the integration of existing models and frameworks in diverse research areas and identify the different challenges to be addressed

    Highly efficient Bayesian joint inversion for receiver-based data and its application to lithospheric structure beneath the southern Korean Peninsula

    Get PDF
    With the deployment of extensive seismic arrays, systematic and efficient parameter and uncertainty estimation is of increasing importance and can provide reliable, regional models for crustal and upper-mantle structure.We present an efficient Bayesian method for the joint inversion of surface-wave dispersion and receiver-function data that combines trans-dimensional (trans-D) model selection in an optimization phase with subsequent rigorous parameter uncertainty estimation. Parameter and uncertainty estimation depend strongly on the chosen parametrization such that meaningful regional comparison requires quantitative model selection that can be carried out efficiently at several sites. While significant progress has been made for model selection (e.g. trans-D inference) at individual sites, the lack of efficiency can prohibit application to large data volumes or cause questionable results due to lack of convergence. Studies that address large numbers of data sets have mostly ignored model selection in favour of more efficient/simple estimation techniques (i.e. focusing on uncertainty estimation but employing ad-hoc model choices). Our approach consists of a two-phase inversion that combines trans-D optimization to select the most probable parametrization with subsequent Bayesian sampling for uncertainty estimation given that parametrization. The trans-D optimization is implemented here by replacing the likelihood function with the Bayesian information criterion (BIC). The BIC provides constraints on model complexity that facilitate the search for an optimal parametrization. Parallel tempering (PT) is applied as an optimization algorithm. After optimization, the optimal model choice is identified by the minimum BIC value from all PT chains. Uncertainty estimation is then carried out in fixed dimension. Data errors are estimated as part of the inference problem by a combination of empirical and hierarchical estimation. Data covariance matrices are estimated from data residuals (the difference between prediction and observation) and periodically updated. In addition, a scaling factor for the covariance matrix magnitude is estimated as part of the inversion. The inversion is applied to both simulated and observed data that consist of phase- and group-velocity dispersion curves (Rayleigh wave), and receiver functions. The simulation results show that model complexity and important features are well estimated by the fixed dimensional posterior probability density. Observed data for stations in different tectonic regions of the southern Korean Peninsula are considered. The results are consistent with published results, but important features are better constrained than in previous regularized inversions and are more consistent across the stations. For example, resolution of crustal and Moho interfaces, and absolute values and gradients of velocities in lower crust and upper mantle are better constrained

    A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

    Full text link
    We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments---active user modelling with preferences, and hierarchical reinforcement learning---and a discussion of the pros and cons of Bayesian optimization based on our experiences

    Development of Energy Models for Design Space Exploration of Embedded Many-Core Systems

    Full text link
    This paper introduces a methodology to develop energy models for the design space exploration of embedded many-core systems. The design process of such systems can benefit from sophisticated models. Software and hardware can be specifically optimized based on comprehensive knowledge about application scenario and hardware behavior. The contribution of our work is an automated framework to estimate the energy consumption at an arbitrary abstraction level without the need to provide further information about the system. We validated our framework with the configurable many-core system CoreVA-MPSoC. Compared to a simulation of the CoreVA-MPSoC on gate level in a 28nm FD-SOI standard cell technology, our framework shows an average estimation error of about 4%.Comment: Presented at HIP3ES, 201

    Embodied Artificial Intelligence through Distributed Adaptive Control: An Integrated Framework

    Full text link
    In this paper, we argue that the future of Artificial Intelligence research resides in two keywords: integration and embodiment. We support this claim by analyzing the recent advances of the field. Regarding integration, we note that the most impactful recent contributions have been made possible through the integration of recent Machine Learning methods (based in particular on Deep Learning and Recurrent Neural Networks) with more traditional ones (e.g. Monte-Carlo tree search, goal babbling exploration or addressable memory systems). Regarding embodiment, we note that the traditional benchmark tasks (e.g. visual classification or board games) are becoming obsolete as state-of-the-art learning algorithms approach or even surpass human performance in most of them, having recently encouraged the development of first-person 3D game platforms embedding realistic physics. Building upon this analysis, we first propose an embodied cognitive architecture integrating heterogenous sub-fields of Artificial Intelligence into a unified framework. We demonstrate the utility of our approach by showing how major contributions of the field can be expressed within the proposed framework. We then claim that benchmarking environments need to reproduce ecologically-valid conditions for bootstrapping the acquisition of increasingly complex cognitive skills through the concept of a cognitive arms race between embodied agents.Comment: Updated version of the paper accepted to the ICDL-Epirob 2017 conference (Lisbon, Portugal

    Automatic synthesis and optimization of chip multiprocessors

    Get PDF
    The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed
    corecore