8 research outputs found

    RPPM : Rapid Performance Prediction of Multithreaded workloads on multicore processors

    Get PDF
    Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly explore the design space in an early design stage. Mechanistic analytical modeling is particularly interesting as it provides deep insight and does not require expensive offline profiling as empirical modeling. Previous work in mechanistic analytical modeling, unfortunately, is limited to single-threaded applications running on single-core processors. This work proposes RPPM, a mechanistic analytical performance model for multi-threaded applications on multicore hardware. RPPM collects microarchitecture-independent characteristics of a multi-threaded workload to predict performance on a previously unseen multicore architecture. The profile needs to be collected only once to predict a range of processor architectures. We evaluate RPPM's accuracy against simulation and report a performance prediction error of 11.2% on average (23% max). We demonstrate RPPM's usefulness for conducting design space exploration experiments as well as for analyzing parallel application performance

    Processor design space exploration and performance prediction

    Get PDF
    The use of simulation is well established in processor design research to evaluate architectural design trade-offs. More importantly, Cycle by Cycle accurate simulation is widely used to evaluate the new designs in processor research because of its accurate and detailed processor performance measurement. However, only configuration in a subspace can be simulated in practice due to its long simulation time and limited resources, leading to suboptimal conclusions that might not be applied to the larger design space. In this thesis, we propose a performance prediction approach which employs a state-of-the-art technique from experimental design, machine learning and data mining. Our model can be trained initially by using Cycle by Cycle accurate simulation results, and then it can be implemented to predict the processor performance of the entire design space. According to our experiments, our model predicts the performance of a single-core processor with median percentage error ranging from 0.32% to 3.01% for about 15 million design spaces by using only 5000 initial independently sampled design points as a training set. In CMP the median percentage error ranges from 0.50% to 1.47% for about 9.7 million design spaces by using only 5000 independently sampled CMP design points as a training set. Apart from this, the model also provides quantitative interpretation tools such as variable importance and partial dependence of the design parameters

    Accurate and efficient processor performance prediction via regression tree based modeling

    Get PDF
    a b s t r a c t Computer architects usually evaluate new designs using cycle-accurate processor simulation. This approach provides a detailed insight into processor performance, power consumption and complexity. However, only configurations in a subspace can be simulated in practice due to long simulation time and limited resource, leading to suboptimal conclusions which might not be applied to a larger design space. In this paper, we propose a performance prediction approach which employs state-of-the-art techniques from experiment design, machine learning and data mining. According to our experiments on single and multi-core processors, our prediction model generates highly accurate estimations for unsampled points in the design space and show the robustness for the worst-case prediction. Moreover, the model provides quantitative interpretation tools that help investigators to efficiently tune design parameters and remove performance bottlenecks

    Prédiction de performance de matériel graphique dans un contexte avionique par apprentissage automatique

    Get PDF
    RÉSUMÉ Le matériel informatique graphique destiné aux ordinateurs de bureau ou aux systèmes embarqués traditionnels, ainsi que leur interface de programmation ne peuvent pas être utilisés dans les systèmes avioniques puisqu’ils ne se conforment pas aux règles de certifications DO-254 et DO-178B. Toutefois, on remarque le faible nombre d’outils de conceptions qui encadrent le développement d’applications graphiques avioniques, et ce malgré l’apparition de matériel graphique avionique de plus en plus performants. Suivant par exemple la méthode classique de conception en V, les ingénieurs doivent d’abord effectuer des choix de conception reliés à la sélection du matériel graphique avant de débuter une quelconque implémentation de code. Ainsi, il peut être difficile d’évaluer la pertinence de ces choix en évaluant les performances de traitement du matériel graphique puisque l’engin graphique n’aurait pas nécessairement été développé. Je propose donc un outil de conception permettant de prédire les performances de matériel graphique en termes d’images par secondes (FPS), basé sur OpenGL SC. L’outil crée des modèles non-paramétriques de performance du matériel en analysant, à l’aide d’algorithmes d’apprentissage, le temps de dessin de chaque image, lors du rendu d’une scène 3D synthétique. Cette scène est rendue à quelques reprises en faisant varié certaines de ses caractéristiques spatiales (nombre de sommets, taille de la scène, taille des textures, etc.) qui font parte intégrante du logiciel de vision synthétique habituellement développé dans ce domaine. Le nombre de combinaisons de ces caractéristiques utilisées durant l’entraînement supervisé des modèles de performance n’est qu’un très petit sous-ensemble de toutes les combinaisons, le but étant de prédire par extrapolation celles manquantes. Pour valider les modèles, une scène 3D fournie par un partenaire industriel est dessinée avec des caractéristiques non traitées durant la phase d’entraînement, puis le FPS de chaque image rendue est comparé au FPS prédit par le modèle. La tendance centrale de l’erreur de prédiction est ensuite démontrée comme étant moins de 4 FPS.----------ABSTRACT Within the strongly regulated avionics engineering field, conventional graphical desktop hardware and software API cannot be used because they do not conform to the DO-254 and DO-178B certifications. We observe the need for better avionic graphical hardware, but system engineers lack system design tools related to graphical hardware. The endorsement of an optimal hardware architecture by estimating the performance of a graphical software, when a stable rendering engine does not yet exist, represents a major challenge. There is also a high potential for development cost reduction, by enabling developers to have a first estimation of the performance of its graphical engine at a low cost. In this paper, we propose to replace expensive development platforms by a predictive software running on desktop. More precisely, we present a system design tool that helps predict the rendering performance of graphical hardware based on the OpenGL SC API. First, we create non-parametric models of the underlying hardware, with machine learning, by analyzing the instantaneous frames-per-second (FPS) of the rendering of a synthetic 3D scene and by drawing multiple times with various characteristics that are typically found in synthetic vision applications. The number of characteristic combinations used during this supervised training phase is a subset of all possible combinations, but performance predictions can be arbitrarily extrapolated. To validate our models, we render an industrial scene with characteristics combinations not used during the training phase and we compare the predictions to real values. We find a median prediction error of less than 4 FPS

    Modeling Data Center Co-Tenancy Performance Interference

    Get PDF
    A multi-core machine allows executing several applications simultaneously. Those jobs are scheduled on different cores and compete for shared resources such as the last level cache and memory bandwidth. Such competitions might cause performance degradation. Data centers often utilize virtualization to provide a certain level of performance isolation. However, some of the shared resources cannot be divided, even in a virtualized system, to ensure complete isolation. If the performance degradation of co-tenancy is not known to the cloud administrator, a data center often has to dedicate a whole machine for a latency-sensitive application to guarantee its quality of service. Co-run scheduling attempts to make good utilization of resources by scheduling compatible jobs into one machine while maintaining their service level agreements. An ideal co-run scheduling scheme requires accurate contention modeling. Recent studies for co-run modeling and scheduling have made steady progress to predict performance for two co-run applications sharing a specific system. This thesis advances co-tenancy modeling in three aspects. First, with an accurate co-run modeling for one system at hand, we propose a regression model to transfer the knowledge and create a model for a new system with different hardware configuration. Second, by examining those programs that yield high prediction errors, we further leverage clustering techniques to create a model for each group of applications that show similar behavior. Clustering helps improve the prediction accuracy of those pathological cases. Third, existing research is typically focused on modeling two application co-run cases. We extend a two-core model to a three- and four-core model by introducing a light-weight micro-kernel that emulates a complicated benchmark through program instrumentation. Our experimental evaluation shows that our cross-architecture model achieves an average prediction error less than 2% for pairwise co-runs across the SPECCPU2006 benchmark suite. For more than two application co-tenancy modeling, we show that our model is more scalable and can achieve an average prediction error of 2-3%

    Prédiction de performance d'algorithmes de traitement d'images sur différentes architectures hardwares

    Get PDF
    In computer vision, the choice of a computing architecture is becoming more difficult for image processing experts. Indeed, the number of architectures allowing the computation of image processing algorithms is increasing. Moreover, the number of computer vision applications constrained by computing capacity, power consumption and size is increasing. Furthermore, selecting an hardware architecture, as CPU, GPU or FPGA is also an important issue when considering computer vision applications.The main goal of this study is to predict the system performance in the beginning of a computer vision project. Indeed, for a manufacturer or even a researcher, selecting the computing architecture should be done as soon as possible to minimize the impact on development.A large variety of methods and tools has been developed to predict the performance of computing systems. However, they do not cover a specific area and they cannot predict the performance without analyzing the code or making some benchmarks on architectures. In this works, we specially focus on the prediction of the performance of computer vision algorithms without the need for benchmarking. This allows splitting the image processing algorithms in primitive blocks.In this context, a new paradigm based on splitting every image processing algorithms in primitive blocks has been developed. Furthermore, we propose a method to model the primitive blocks according to the software and hardware parameters. The decomposition in primitive blocks and their modeling was demonstrated to be possible. Herein, the performed experiences, on different architectures, with real data, using algorithms as convolution and wavelets validated the proposed paradigm. This approach is a first step towards the development of a tool allowing to help choosing hardware architecture and optimizing image processing algorithms.Dans le contexte de la vision par ordinateur, le choix d’une architecture de calcul est devenu de plus en plus complexe pour un spécialiste du traitement d’images. Le nombre d’architectures permettant de résoudre des algorithmes de traitement d’images augmente d’année en année. Ces algorithmes s’intègrent dans des cadres eux-mêmes de plus en plus complexes répondant à de multiples contraintes, que ce soit en terme de capacité de calculs, mais aussi en terme de consommation ou d’encombrement. A ces contraintes s’ajoute le nombre grandissant de types d’architectures de calculs pouvant répondre aux besoins d’une application (CPU, GPU, FPGA). L’enjeu principal de l’étude est la prédiction de la performance d’un système, cette prédiction pouvant être réalisée en phase amont d’un projet de développement dans le domaine de la vision. Dans un cadre de développement, industriel ou de recherche, l’impact en termes de réduction des coûts de développement, est d’autant plus important que le choix de l’architecture de calcul est réalisé tôt. De nombreux outils et méthodes d’évaluation de la performance ont été développés mais ceux-ci, se concentrent rarement sur un domaine précis et ne permettent pas d’évaluer la performance sans une étude complète du code ou sans la réalisation de tests sur l’architecture étudiée. Notre but étant de s’affranchir totalement de benchmark, nous nous sommes concentrés sur le domaine du traitement d’images pour pouvoir décomposer les algorithmes du domaine en éléments simples ici nommées briques élémentaires. Dans cette optique, un nouveau paradigme qui repose sur une décomposition de tout algorithme de traitement d’images en ces briques élémentaires a été conçu. Une méthode est proposée pour modéliser ces briques en fonction de paramètres software et hardwares. L’étude démontre que la décomposition en briques élémentaires est réalisable et que ces briques élémentaires peuvent être modélisées. Les premiers tests sur différentes architectures avec des données réelles et des algorithmes comme la convolution et les ondelettes ont permis de valider l'approche. Ce paradigme est un premier pas vers la réalisation d’un outil qui permettra de proposer des architectures pour le traitement d’images et d’aider à l’optimisation d’un programme dans ce domaine

    Microarchitecture-independent analytical branch behavior and multi-threaded performance modeling

    Get PDF
    corecore