14 research outputs found

    Fine grain analysis of simulators accuracy for calibrating performance models

    Get PDF
    In embedded system design, the tuning and validation of a cycle accurate simulator is a difficult task. The designer has to assure that the estimation error of the simulator meets the design constraints on every application. If an application is not correctly estimated, the designer has to identify on which parts of the application the simulator introduces an estimation error and consequently fix the simulator. However, detecting which are the mispredicted parts of a very large application can be a difficult process which requires a lot of time. In this paper we propose a methodology which helps the designer to fast and automatically isolate the portions of the application mispredicted by a simulator. This is accomplished by recursively analyzing the application source code trace highlighting the mispredicted sections of source code. The results obtained applying the methodology to the TSIM simulator show how our methodology is able to fast analyze large applications isolating small portions of mispredicted code

    SSI Revisited

    Get PDF
    The static single information (SSI) form, proposed by Ananian, then in a more general form by Singer, is an extension of the static single assignment (SSA) form. The latter is a well-established compiler intermediate representation that has been successfully used for numerous compiler analysis and optimizations. Several interesting results have also been shown for SSI concerning liveness analysis and representation of live-ranges of variables, which could make SSI appealing for just-in-time compilation. Unfortunately, previous literature on the SSI form is sparse and appears to be partly incorrect. Our paper corrects some of the mistakes that have been made. Our main result is a complete proof that, even for the most general definition of SSI, basic blocks, and thus program points, can be totally ordered so that live-ranges of variables correspond to intervals. This corrects the erroneous proof of Brisk and Sarrafzadeh

    Performance Estimation for Task Graphs Combining Sequential Path Profiling and Control Dependence Regions

    Get PDF
    The speed-up estimation of parallelized code is crucial to efficiently compare different parallelization techniques or task graph transformations. Unfortunately, most of the time, during the parallelization of a specification, the information that can be extracted by profiling the corresponding sequential code (e.g. the most executed paths) are not properly taken into account. In particular, correlating sequential path profiling with the corresponding parallelized code can help in the identification of code hot spots, opening new possibilities for automatic parallelization. For this reason, starting from a well-known profiling technique, the Efficient Path Profiling, we propose a methodology that estimates the speed-up of a parallelized specification, just using the corresponding hierarchical task graph representation and the information coming from the dynamic profiling of the initial sequential specification. Experimental results show that the proposed solution outperforms existing approaches

    Exploiting Vectorization in High Level Synthesis of Nested Irregular Loops

    Get PDF
    Synthesis of DoAll loops is a key aspect of High Level Synthesis since they allow to easily exploit the potential parallelism provided by programmable devices. This type of parallelism can be implemented in several ways: by duplicating the implementation of body loop, by exploiting loop pipelining or by applying vectorization. In this paper a methodology for the synthesis of nested irregular DoAll loops based on outer vectorization is proposed. The methodology transforms the intermediate representation of the DoAll loop to introduce vectorization and it can be easily integrated in existing state of the art High Level Synthesis flows since does not require any modification in the rest of the flow. Vectorization is not limited to perfectly nested countable loops: conditional constructs and loops with variable number of iterations are supported. Experimental results on parallel benchmarks show that the generated parallel accelerators have significant speed-up with limited penalties in terms of resource usage and frequency decrement

    Using Efficient Path Profiling to Optimize Memory Consumption of On-Chip Debugging for High-Level Synthesis

    Get PDF
    High-Level Synthesis (HLS) for FPGAs is attracting popularity and is increasingly used to handle complex systems with multiple integrated components. To increase performance and efficiency, HLS flows now adopt several advanced optimization techniques. Aggressive optimizations and system level integration can cause the introduction of bugs that are only observable on-chip. Debugging support for circuits generated with HLS is receiving a considerable attention. Among the data that can be collected on chip for debugging, one of the most important is the state of the Finite State Machines (FSM) controlling the components of the circuit. However, this usually requires a large amount of memory to trace the behavior during the execution. This work proposes an approach that takes advantage of the HLS information and of the structure of the FSM to compress control flow traces and to integrate optimized components for on-chip debugging. The generated checkers analyze the FSM execution on-fly, automatically notifying when a bug is detected, localizing it and providing data about its cause. The traces are compressed using a software profiling technique, called Efficient Path Profiling (EPP), adapted for the debugging of hardware accelerators generated with HLS. With this technique, the size of the memory used to store control flow traces can be reduced up to 2 orders of magnitude, compared to state-of-the-art

    Transparent Parallelization of Binary Code

    Get PDF
    International audienceThis paper describes a system that applies automatic parallelization techniques to binary code. The system works by raising raw executable code to an intermediate representation that exhibits all memory accesses and relevant register definitions, but outlines detailed computations that are not relevant for parallelization. It then uses an off-the-shelf polyhedral parallelizer, first applying appropriate enabling transformations if necessary. The last phase lowers the internal representation into a new executable fragment, re-injecting low-level instructions into the transformed code. The system is shown to leverage the power of polyhedral parallelization techniques in the absence of source code, with performance approaching those of source-to-source tools

    Performance Estimation of Task Graphs Based on Path Profiling

    Get PDF
    Correctly estimating the speed-up of a parallel embedded application is crucial to efficiently compare different parallelization techniques, task graph transformations or mapping and scheduling solutions. Unfortunately, especially in case of control-dominated applications, task correlations may heavily affect the execution time of the solutions and usually this is not properly taken into account during performance analysis. We propose a methodology that combines a single profiling of the initial sequential specification with different decisions in terms of partitioning, mapping, and scheduling in order to better estimate the actual speed-up of these solutions. We validated our approach on a multi-processor simulation platform: experimental results show that our methodology, effectively identifying the correlations among tasks, significantly outperforms existing approaches for speed-up estimation. Indeed, we obtained an absolute error less than 5 % in average, even when compiling the code with different optimization levels

    WS-Pro: a Petri net based performance-driven service composition framework

    Get PDF
    As an emerging area gaining prevalence in the industry, Web Services was established to satisfy the needs for better flexibility and higher reliability in web applications. However, due to the lack of reliable frameworks and difficulties in constructing versatile service composition platform, web developers encountered major obstacles in large-scale deployment of web services. Meanwhile, performance has been one of the major concerns and a largely unexplored area in Web Services research. There is high demand for researchers to conceive and develop feasible solutions to design, monitor, and deploy web service systems that can adapt to failures, especially performance failures. Though many techniques have been proposed to solve this problem, none of them offers a comprehensive solution to overcome the difficulties that challenge practitioners. Central to the performance-engineering studies, performance analysis and performance adaptation are of paramount importance to the success of a software project. The industry learned through many hard lessons the significance of well-founded and well-executed performance engineering plans. An important fact is that it is too expensive to tackle performance evaluation, mostly through performance testing, after the software is developed. This is especially true in recent decades when software complexity has risen sharply. After the system is deployed, performance adaptation is essential to maintaining and improving software system reliability. Performance adaptation provides techniques to mitigate the consequence of performance failures and therefore is an important research issue. Performance adaptation is particularly meaningful for mission-critical software systems and software systems with inevitable frequent performance failures, such as Web Services. This dissertation focuses on Web Services framework and proposes a performance-driven service composition scheme, called WS-Pro, to support both performance analysis and performance adaptation. A formalism of transformation from WS-BPEL to Petri net is first defined to enable the analysis of system properties and facilitate quality prediction. A state-transition based proof is presented to show that the transformed Petri net model correctly simulates the behavior of the WS-BPEL process. The generated Petri net model was augmented using performance data supplied by both historical data and runtime data. Results of executing the Petri nets suggest that optimal composition plans can be achieved based on the proposed method. The performance of service composition procedure is an important research issue which has not been sufficiently treated by researchers. However, such an issue is critical for dynamic service composition, where re-planning must be done in a timely manner. In order to improve the performance of service composition procedure and enhance performance adaptation, this dissertation presents an algorithm to remove loops in the reachability graphs so that a large portion of the computation time of service composition can be moved to a pre-processing unit; hence the response time is shortened during runtime. We also extended the WS-Pro to the ubiquitous computing area to improve fault-tolerance

    Acceleration and semantic foundations of embedded Java platforms

    Get PDF
    Tableau d'honneur de la Faculté des études supérieures et postdoctorales, 2006-200
    corecore