328 research outputs found

    CAWET: Context-Aware Worst-Case Execution Time Estimation Using Transformers

    Get PDF
    This paper presents CAWET, a hybrid worst-case program timing estimation technique. CAWET identifies the longest execution path using static techniques, whereas the worst-case execution time (WCET) of basic blocks is predicted using an advanced language processing technique called Transformer-XL. By employing Transformers-XL in CAWET, the execution context formed by previously executed basic blocks is taken into account, allowing for consideration of the micro-architecture of the processor pipeline without explicit modeling. Through a series of experiments on the TacleBench benchmarks, using different target processors (Arm Cortex M4, M7, and A53), our method is demonstrated to never underestimate WCETs and is shown to be less pessimistic than its competitors

    Schedulability Analysis of Real-Time Systems with Uncertain Worst-Case Execution Times

    Get PDF
    Schedulability analysis is about determining whether a given set of real-time software tasks are schedulable, i.e., whether task executions always complete before their specified deadlines. It is an important activity at both early design and late development stages of real-time systems. Schedulability analysis requires as input the estimated worst-case execution times (WCET) for software tasks. However, in practice, engineers often cannot provide precise point WCET estimates and prefer to provide plausible WCET ranges. Given a set of real-time tasks with such ranges, we provide an automated technique to determine for what WCET values the system is likely to meet its deadlines, and hence operate safely. Our approach combines a search algorithm for generating worst-case scheduling scenarios with polynomial logistic regression for inferring safe WCET ranges. We evaluated our approach by applying it to a satellite on-board system. Our approach efficiently and accurately estimates safe WCET ranges within which deadlines are likely to be satisfied with high confidence

    DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car

    Full text link
    We present DeepPicar, a low-cost deep neural network based autonomous car platform. DeepPicar is a small scale replication of a real self-driving car called DAVE-2 by NVIDIA. DAVE-2 uses a deep convolutional neural network (CNN), which takes images from a front-facing camera as input and produces car steering angles as output. DeepPicar uses the same network architecture---9 layers, 27 million connections and 250K parameters---and can drive itself in real-time using a web camera and a Raspberry Pi 3 quad-core platform. Using DeepPicar, we analyze the Pi 3's computing capabilities to support end-to-end deep learning based real-time control of autonomous vehicles. We also systematically compare other contemporary embedded computing platforms using the DeepPicar's CNN-based real-time control workload. We find that all tested platforms, including the Pi 3, are capable of supporting the CNN-based real-time control, from 20 Hz up to 100 Hz, depending on hardware platform. However, we find that shared resource contention remains an important issue that must be considered in applying CNN models on shared memory based embedded computing platforms; we observe up to 11.6X execution time increase in the CNN based control loop due to shared resource contention. To protect the CNN workload, we also evaluate state-of-the-art cache partitioning and memory bandwidth throttling techniques on the Pi 3. We find that cache partitioning is ineffective, while memory bandwidth throttling is an effective solution.Comment: To be published as a conference paper at RTCSA 201

    Providing Information by Resource- Constrained Data Analysis

    Get PDF
    The Collaborative Research Center SFB 876 (Providing Information by Resource-Constrained Data Analysis) brings together the research fields of data analysis (Data Mining, Knowledge Discovery in Data Bases, Machine Learning, Statistics) and embedded systems and enhances their methods such that information from distributed, dynamic masses of data becomes available anytime and anywhere. The research center approaches these problems with new algorithms respecting the resource constraints in the different scenarios. This Technical Report presents the work of the members of the integrated graduate school

    Incorporating temporal-bounded CBR techniques in real-time agents

    Full text link
    Nowadays, MAS paradigm tries to move Computation to a new level of abstraction: Computation as interaction, where large complex systems are seen in terms of the services they offer, and consequently in terms of the entities or agents providing or consuming services. However, MAS technology is found to be lacking in some critical environments as real-time environments. An interaction-based vision of a real-time system involves the purchase of a responsibility by any entity or agent for the accomplishment of a required service under possibly hard or soft temporal conditions. This vision notably increases the complexity of these kinds of systems. The main problem in the architecture development of agents in real-time environments is with the deliberation process where it is difficult to integrate complex bounded deliberative processes for decision-making in a simple and efficient way. According to this, this work presents a temporal-bounded deliberative case-based behaviour as an anytime solution. More specifically, the work proposes a new temporal-bounded CBR algorithm which facilitates deliberative processes for agents in real-time environments, which need both real-time and deliberative capabilities. The paper presents too an application example for the automated management simulation of internal and external mail in a department plant. This example has allowed to evaluate the proposal investigating the performance of the system and the temporal-bounded deliberative case-based behaviour. 2010 Elsevier Ltd. All rights reserved.This work is supported by TIN2006-14630-C03-01 projects of the Spanish government, GVPRE/2008/070 project, FEDER funds and CONSOLIDER-INGENIO 2010 under Grant CSD2007-00022.Navarro Llácer, M.; Heras Barberá, SM.; Julian Inglada, VJ.; Botti Navarro, VJ. (2011). Incorporating temporal-bounded CBR techniques in real-time agents. Expert Systems with Applications. 38(3):2783-2796. https://doi.org/10.1016/j.eswa.2010.08.070S2783279638

    WE-HML: hybrid WCET estimation using machine learning for architectures with caches

    Get PDF
    International audienceModern processors raise a challenge for WCET estimation, since detailed knowledge of the processor microarchitecture is not available. This paper proposes a novel hybrid WCET estimation technique, WE-HML, in which the longest path is estimated using static techniques, whereas machine learning (ML) is used to determine the WCET of basic blocks. In contrast to existing literature using ML techniques for WCET estimation, WE-HML (i) operates on binary code for improved precision of learning, as compared to the related techniques operating at source code or intermediate code level; (ii) trains the ML algorithms on a large set of automatically generated programs for improved quality of learning; (iii) proposes a technique to take into account data caches. Experiments on an ARM Cortex-A53 processor show that for all benchmarks, WCET estimates obtained by WE-HML are larger than all possible execution times. Moreover, the cache modeling technique of WE-HML allows an improvement of 65 percent on average of WCET estimates compared to its cache-agnostic equivalent

    CleanET: enabling timing validation for complex automotive systems

    Get PDF
    Timing validation for automotive systems occurs in late integration stages when it is hard to control how the instances of software tasks overlap in time. To make things worse, in complex software systems, like those for autonomous driving, tasks schedule has a strong event-driven nature, which further complicates relating those task-overlapping scenarios (TOS) captured during the software timing budgeting and those observed during validation phases. This paper proposes CleanET, an approach to derive the dilation factor r caused due to the simultaneous execution of multiple tasks. To that end, CleanET builds on the captured TOS during testing and predicts how tasks execution time react under untested TOS (e.g. full overlap), hence acting as a mean of robust testing. CleanET also provides additional evidence for certification about the derived timing budgets for every task. We apply CleanET to a commercial autonomous driving framework, Apollo, where task measurements can only be reasonably collected under 'arbitrary' TOS. Our results show that CleanET successfully derives the dilation factor and allows assessing whether execution times for the different tasks adhere to their respective deadlines for unobserved scenarios.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015- 65316-P, the SuPerCom European Research Council (ERC) project under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773), and the HiPEAC Network of Excellence. MINECO partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717).Peer ReviewedPostprint (author's final draft

    O pior caso estático de otimização do tempo de execução utilizando dpso para arquitetura ASIP

    Get PDF
    Introduction: The application of specific instructions significantly improves energy, performance, and code size of configurable processors. The design of these instructions is performed by the conversion of patterns related to application-specific operations into effective complex instructions. This research was presented at the icitkm Conference, University of Delhi, India in 2017. Methods: Static analysis was a prominent research method during late the 1980’s. However, end-to-end measurements consist of a standard approach in industrial settings. Both static analysis tools perform at a high-level in order to determine the program structure, which works on source code, or is executable in a disassembled binary. It is possible to work at a low-level if the real hardware timing information for the executable task has the desired features. Results: We experimented, tested and evaluated using a H.264 encoder application that uses nine cis, covering most of the computation intensive kernels. Multimedia applications are frequently subject to hard real time constraints in the field of computer vision. The H.264 encoder consists of complicated control flow with more number of decisions and nested loops. The parameters evaluated were different numbers of A partitions (300 slices on a Xilinx Virtex 7each), reconfiguration bandwidths, as well as relations of cpu frequency and fabric frequency fCPU/ffabric. ffabric remains constant at 100MHz, and we selected a multiplicity of its values for fCPU that resemble realistic units. Note that while we anticipate the wcet in seconds (wcetcycles/ f CPU) to be lower (better) with higher fCPU, the wcet cycles increase (at a constant ffabric) because hardware cis perform less computations on the reconfigurable fabric within one cpu cycle.    Introducción: la aplicación de instrucciones específicas mejora significativamente la energía, el rendimiento y el tamaño del código de los procesadores configurables. El diseño de estas instrucciones se realiza mediante conversión de patrones relacionados con operaciones específicas de la aplicación con instrucciones complejas y efectivas. Esta investigación se presentó en la Conferencia icitkm, Universidad de Delhi, India en 2017. Métodos: el análisis estático fue un método de investigación prominente durante la década de 1980; sin embargo, las mediciones de extremo a extremo son un enfoque convencional en los entornos industriales. Ambas herramientas de análisis estático se desempeñan a un alto nivel para determinar la estructura del programa que funciona en el código fuente, o que se ejecuta en un binario desmontado. Es posible trabajar a bajo nivel si la información de tiempo de hardware real para la tarea ejecutable presenta las características deseadas.  Introdução: a aplicação de instruções específicas melhora significativamente a energia, o desempenho e o tamanho do código dos processadores configuráveis. O desenho dessas instruções é realizado mediante a conversão de padrões relacionados com operações específicas da aplicação com instruções complexas e efetivas. Esta pesquisa foi apresentada na Conferência icitkm, Universidade de Délhi, Índia em 2017.Métodos: a análise estática foi um método de pesquisa proeminente durante a década de 1980; contudo, as medições de extremo a extremo são uma abordagem convencional nos contextos industriais. Ambas as ferramentas de análise estática se desempenham a um alto nível para determinar a estrutura do programa que funciona no código fonte ou que se executa num binário desmontado. É possível trabalhar a baixo nível se a informação de tempo de hardware real para a tarefa executável apresentar as características desejadas.Resultados: experimentamos, testamos e avaliamos com uma aplicação de codificação H.264 que utiliza nove elementos de configuração e cobre a maioria dos núcleos de cálculo intensivo. As aplicações multimídias estão com frequência sujeitas a duras restrições em tempo real no campo da visão por computador. O codificador H.264 consiste num complicado fluxo de controle com mais número de decisões e circuitos aninhados. Os parâmetros avaliados foram de diferentes números de particiones A (300 cortes num Xilinx Virtex 7 cada um) e largos de banda de reconfiguração, bem como de relações de frequência de cpu e frequência de fabric fcpu/ffabric. ffabric permanece constante a 100MHz. Selecionamos vários de seus valores para fcpu que são semelhantes a unidades realistas. É importante considerar que, ainda quando antecipamos o wcet em segundos (ciclos wcet/ fcpu), para que fossem inferiores (melhores) com fcpu mais alta, os ciclos wcet aumentam (num tecido constante f) porque os ci de hardware realizam menos cálculos no tecido reconfigurável dentro de uma cpu de ciclo.Conclusões: o método é similar à hibridação de árvores e métodos baseados en rotas, os quais são menos precisos, e ao método I pet global, que é mais preciso. A otimização é avaliada com o algoritmo de otimização por enxame de partículas discretas (dpso) para wcet. Para várias aplicações do mundo real que envolvem processadores integrados, a técnica proposta desenvolve conjuntos de instruções melhoradas em comparação com os conjuntos de instruções nativas.Originalidade: para a estimativa de wcet, deve-se considerar a análise de fluxo, a análise de baixo nível e as fases de cálculo do programa. A fase de análise de fluxo ou alto nível de análise ajuda a extrair o comportamento dinâmico do programa que proporciona informação sobre as funções invocadas, sobre o número de iterações de circuito, as dependências entre sentenças if, etc. Isso se deve a que a análise desconhece a rota de execução correspondente ao tempo de execução mais longo.Limitações: essa rota é executada dentro de uma iteração do núcleo que depende da natureza de mb, seja i-mb, seja p-mb, determinada pelo núcleo de estimativa de movimento, quer dizer que sua entrada depende das rotas i-mb e p-mb, que também contêm elementos de configuração separados que conduzem à instabilidade da rota do pior dos casos; em outras palavras, adicionar mais partições à rota atual do pior dos casos pode fazer com que a outra rota se converta no pior dos casos. A tubulação se detém pela demora de reconfiguração e continua ao ingressar no núcleo assim que finaliza o processo de reconfiguraçã
    corecore