1,383 research outputs found

    BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

    Full text link
    We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair of producer-consumer operators into consideration. We propose a branch and bound based approach with three heuristics to resolve the resulting nontrivial optimization problem. The experimental evaluations demonstrate that BriskStream yields much higher throughput and better scalability than existing DSPSs on multi-core architectures when processing different types of workloads.Comment: To appear in SIGMOD'1

    Improving early design stage timing modeling in multicore based real-time systems

    Get PDF
    This paper presents a modelling approach for the timing behavior of real-time embedded systems (RTES) in early design phases. The model focuses on multicore processors - accepted as the next computing platform for RTES - and in particular it predicts the contention tasks suffer in the access to multicore on-chip shared resources. The model presents the key properties of not requiring the application's source code or binary and having high-accuracy and low overhead. The former is of paramount importance in those common scenarios in which several software suppliers work in parallel implementing different applications for a system integrator, subject to different intellectual property (IP) constraints. Our model helps reducing the risk of exceeding the assigned budgets for each application in late design stages and its associated costs.This work has received funding from the European Space Agency under Project Reference AO=17722=13=NL=LvH, and has also been supported by the Spanish Ministry of Science and Innovation grant TIN2015-65316-P. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    Randomized Caches Can Be Pretty Useful to Hard Real-Time Systems

    Get PDF
    Cache randomization per se, and its viability for probabilistic timing analysis (PTA) of critical real-time systems, are receiving increasingly close attention from the scientific community and the industrial practitioners. In fact, the very notion of introducing randomness and probabilities in time-critical systems has caused strenuous debates owing to the apparent clash that this idea has with the strictly deterministic view traditionally held for those systems. A paper recently appeared in LITES (Reineke, J. (2014). Randomized Caches Considered Harmful in Hard Real-Time Systems. LITES, 1(1), 03:1-03:13.) provides a critical analysis of the weaknesses and risks entailed in using randomized caches in hard real-time systems. In order to provide the interested reader with a fuller, balanced appreciation of the subject matter, a critical analysis of the benefits brought about by that innovation should be provided also. This short paper addresses that need by revisiting the array of issues addressed in the cited work, in the light of the latest advances to the relevant state of the art. Accordingly, we show that the potential benefits of randomized caches do offset their limitations, causing them to be - when used in conjunction with PTA - a serious competitor to conventional designs

    Modelling Contention in Multicore Hardware Resources during Early Design Stages of Real-Time Systems

    Get PDF
    This thesis presents a modelling approach for the timing behavior of real-time embedded systems in early design phases. The model focuses on multicore processors and it predicts the contention tasks suffer in the access to multicore on-chip shared resources

    Exploiting Graphics Processing Units for Massively Parallel Multi-Dimensional Indexing

    Get PDF
    Department of Computer EngineeringScientific applications process truly large amounts of multi-dimensional datasets. To efficiently navigate such datasets, various multi-dimensional indexing structures, such as the R-tree, have been extensively studied for the past couple of decades. Since the GPU has emerged as a new cost-effective performance accelerator, now it is common to leverage the massive parallelism of the GPU in various applications such as medical image processing, computational chemistry, and particle physics. However, hierarchical multi-dimensional indexing structures are inherently not well suited for parallel processing because their irregular memory access patterns make it difficult to exploit massive parallelism. Moreover, recursive tree traversal often fails due to the small run-time stack and cache memory in the GPU. First, we propose Massively Parallel Three-phase Scanning (MPTS) R-tree traversal algorithm to avoid the irregular memory access patterns and recursive tree traversal so that the GPU can access tree nodes in a sequential manner. The experimental study shows that MPTS R-tree traversal algorithm consistently outperforms traditional recursive R-Tree search algorithm for multi-dimensional range query processing. Next, we focus on reducing the query response time and extending n-ary multi-dimensional indexing structures - R-tree, so that a large number of GPU threads cooperate to process a single query in parallel. Because the number of submitted concurrent queries in scientific data analysis applications is relatively smaller than that of enterprise database systems and ray tracing in computer graphics. Hence, we propose a novel variant of R-trees Massively Parallel Hilbert R-Tree (MPHR-Tree), which is designed for a novel parallel tree traversal algorithm Massively Parallel Restart Scanning (MPRS). The MPRS algorithm traverses the MPHR-Tree in mostly contiguous memory access patterns without recursion, which offers more chances to optimize the parallel SIMD algorithm. Our extensive experimental results show that the MPRS algorithm outperforms the other stackless tree traversal algorithms, which are designed for efficient ray tracing in computer graphics community. Furthermore, we develop query co-processing scheme that makes use of both the CPU and GPU. In this approach, we store the internal and leaf nodes of upper tree in CPU host memory and GPU device memory, respectively. We let the CPU traverse internal nodes because the conditional branches in hierarchical tree structures often cause a serious warp divergence problem in the GPU. For leaf nodes, the GPU scans a large number of leaf nodes in parallel based on the selection ratio of a given range query. It is well known that the GPU is superior to the CPU for parallel scanning. The experimental results show that our proposed multi-dimensional range query co-processing scheme improves the query response time by up to 12x and query throughput by up to 4x compared to the state-of-the-art GPU tree traversal algorithm.ope

    A Survey of Probabilistic Timing Analysis Techniques for Real-Time Systems

    Get PDF
    This survey covers probabilistic timing analysis techniques for real-time systems. It reviews and critiques the key results in the field from its origins in 2000 to the latest research published up to the end of August 2018. The survey provides a taxonomy of the different methods used, and a classification of existing research. A detailed review is provided covering the main subject areas: static probabilistic timing analysis, measurement-based probabilistic timing analysis, and hybrid methods. In addition, research on supporting mechanisms and techniques, case studies, and evaluations is also reviewed. The survey concludes by identifying open issues, key challenges and possible directions for future research

    Enabling caches in probabilistic timing analysis

    Get PDF
    Hardware and software complexity of future critical real-time systems challenges the scalability of traditional timing analysis methods. Measurement-Based Probabilistic Timing Analysis (MBPTA) has recently emerged as an industrially-viable alternative technique to deal with complex hardware/software. Yet, MBPTA requires certain timing properties in the system under analysis that are not satisfied in conventional systems. In this thesis, we introduce, for the first time, hardware and software solutions to satisfy those requirements as well as to improve MBPTA applicability. We focus on one of the hardware resources with highest impact on both average performance and Worst-Case Execution Time (WCET) in current real-time platforms, the cache. In this line, the contributions of this thesis follow three different axes: hardware solutions and software solutions to enable MBPTA, and MBPTA analysis enhancements in systems featuring caches. At hardware level, we set the foundations of MBPTA-compliant processor designs, and define efficient time-randomised cache designs for single- and multi-level hierarchies of arbitrary complexity, including unified caches, which can be time-analysed for the first time. We propose three new software randomisation approaches (one dynamic and two static variants) to control, in an MBPTA-compliant manner, the cache jitter in Commercial off-the-shelf (COTS) processors in real-time systems. To that end, all variants randomly vary the location of programs' code and data in memory across runs, to achieve probabilistic timing properties similar to those achieved with customised hardware cache designs. We propose a novel method to estimate the WCET of a program using MBPTA, without requiring the end-user to identify worst-case paths and inputs, improving its applicability in industry. We also introduce Probabilistic Timing Composability, which allows Integrated Systems to reduce their WCET in the presence of time-randomised caches. With the above contributions, this thesis pushes the limits in the use of complex real-time embedded processor designs equipped with caches and paves the way towards the industrialisation of MBPTA technology.La complejidad de hardware y software de los sistemas críticos del futuro desafía la escalabilidad de los métodos tradicionales de análisis temporal. El análisis temporal probabilístico basado en medidas (MBPTA) ha aparecido últimamente como una solución viable alternativa para la industria, para manejar hardware/software complejo. Sin embargo, MBPTA requiere ciertas propiedades de tiempo en el sistema bajo análisis que no satisfacen los sistemas convencionales. En esta tesis introducimos, por primera vez, soluciones hardware y software para satisfacer estos requisitos como también mejorar la aplicabilidad de MBPTA. Nos centramos en uno de los recursos hardware con el máximo impacto en el rendimiento medio y el peor caso del tiempo de ejecución (WCET) en plataformas actuales de tiempo real, la cache. En esta línea, las contribuciones de esta tesis siguen 3 ejes distintos: soluciones hardware y soluciones software para habilitar MBPTA, y mejoras de el análisis MBPTA en sistemas usado caches. A nivel de hardware, creamos las bases del diseño de un procesador compatible con MBPTA, y definimos diseños de cache con tiempo aleatorio para jerarquías de memoria con uno y múltiples niveles de cualquier complejidad, incluso caches unificadas, las cuales pueden ser analizadas temporalmente por primera vez. Proponemos tres nuevos enfoques de aleatorización de software (uno dinámico y dos variedades estáticas) para manejar, en una manera compatible con MBPTA, la variabilidad del tiempo (jitter) de la cache en procesadores comerciales comunes en el mercado (COTS) en sistemas de tiempo real. Por eso, todas nuestras propuestas varían aleatoriamente la posición del código y de los datos del programa en la memoria entre ejecuciones del mismo, para conseguir propiedades de tiempo aleatorias, similares a las logradas con diseños hardware personalizados. Proponemos un nuevo método para estimar el WCET de un programa usando MBPTA, sin requerir que el usuario dentifique los caminos y las entradas de programa del peor caso, mejorando así la aplicabilidad de MBPTA en la industria. Además, introducimos la composabilidad de tiempo probabilística, que permite a los sistemas integrados reducir su WCET cuando usan caches de tiempo aleatorio. Con estas contribuciones, esta tesis empuja los limites en el uso de diseños complejos de procesadores empotrados en sistemas de tiempo real equipados con caches y prepara el terreno para la industrialización de la tecnología MBPTA
    corecore