36 research outputs found

    Kiihdytetyn laskennan ajoituksen ja energiankulutuksen simulointi

    Get PDF
    As the increase in the sequential processing performance of general-purpose central processing units has slowed down dramatically, computer systems have been moving towards increasingly parallel and heterogeneous architectures. Modern graphics processing units have emerged as one of the first affordable platforms for data-parallel processing. Due to their closed nature, it has been difficult for software developers to observe the performance and energy efficiency characteristics of the execution of applications of graphics processing units. In this thesis, we have explored different tools and methods for observing the execution of accelerated processing on graphics processing units. We have found that hardware vendors provide interfaces for observing the timing of events that occur on the host platform and aggregated performance metrics of execution on the graphics processing units to some extent. However, more fine-grained details of execution are currently available only by using graphics processing unit simulators. As a proof-of-concept, we have studied a functional graphics processing unit simulator as a tool for understanding the energy efficiency of accelerated processing. The presented energy estimation model and simulation method has been validated against a face detection application. The difference between the estimated and measured dynamic energy consumption in this case was found to be 5.4%. Functional simulators appear to be accurate enough to be used for observing the energy efficiency of graphics processing unit accelerated processing in certain use-cases.Suorittimien sarjallisen suorituskyvyn kasvun hidastuessa tietokonejärjestelmät ovat siirtymässä kohti rinnakkaislaskentaa ja heterogeenisia arkkitehtuureja. Modernit grafiikkasuorittimet ovat yleistyneet ensimmäisinä huokeina alustoina yleisluonteisen kiihdytetyn datarinnakkaisen laskennan suorittamiseen. Grafiikkasuorittimet ovat usein suljettuja alustoja, minkä takia ohjelmistokehittäjien on vaikea havainnoida tarkempia yksityiskohtia suorituksesta liittyen laskennan suorituskykyyn ja energian kulutukseen. Tässä työssä on tutkittu erilaisia työkaluja ja tapoja tarkkailla ohjelmien kiihdytettyä suoritusta grafiikkasuorittimilla. Laitevalmistajat tarjoavat joitakin rajapintoja tapahtumien ajoituksen havainnointiin sekä isäntäalustalla että grafiikkasuorittimella. Laskennan tarkempaan havainnointiin on kuitenkin usein käytettävä grafiikkasuoritinsimulaattoreita. Työn kokeellisessa osuudessa työssä on tutkittu funktionaalisten grafiikkasuoritinsimulaattoreiden käyttöä työkaluna grafiikkasuorittimella kiihdytetyn laskennan energiantehokkuuden arvioinnissa. Työssä on malli grafiikkasuorittimen energian kulutuksen arviontiin. Arvion validointiin on käytetty kasvontunnistussovellusta. Mittauksissa arvioidun ja mitatun energian kulutuksen eroksi mitattiin 5.4%. Funktionaaliset simulaattorit ovat mittaustemme perusteella tietyissä käyttötarkoituksissa tarpeeksi tarkkoja grafiikkasuorittimella kiihdytetyn laskennan energiatehokkuuden arviointiin

    On-board computers of tware development for PocketQubes

    Get PDF
    Due to an increasing entry barrier to both universities and researchers in conventional small satellites initiatives, there has been an emergence of smaller and cheaper spacecrafts like PocketQubes, a much more affordable option for public entities. Considering all that, this thesis has been developed as part of a project undertaken by the UPC NanoSat Lab which started as an IEEE GRSS initiative named PoCat. The objective of the PoCat project is to design, develop and test three single unit picosatellites with each featuring the following different payloads, a Video Graphics Array (VGA) camera, a Radio Frequency Interference (RFI) monitoring system at L-band and an RFI monitoring system at K-Band. The three satellites, however, contain the same avionics core composed of the On-Board Computer (OBC), Communication System (COMMS), Attitude Determination and Control System (ADCS) and Electrical Power Supply System (EPS). Given this scenario, the purpose of this report is to give insights regarding the on-board software development for PocketQubes, which as of writing this report there is still room for debate concerning their standardization. Conceptually, the OBC system acts as the brain of the satellite. As for the technology, the OBC is build upon the STM32L476 microcontroller due to its memory storage, power efficiency and processing power. Performing the power management as well as giving response to events are the main objectives of the OBC. In order to do the latter, the flight software is based on a Real-Time Operating System (RTOS) called FreeRTOS which has been selected above other Operating System (OS) due to its predictability, compliance with the required deadlines, support, available documentation, compatibility with previous projects, licensing costs, and certifications. According to FreeRTOS, the program is structured into independent tasks where each one features a priority in line with their criticality. Then, the project is constituted by eight tasks, with five designed to operate the subsystems of the satellite, one for memory writing purposes and the remaining two are devoted to implement FreeRTOS software timers. Furthermore, tasks can communicate with each other by means of task notifications or event groups, both being software tools provided by the operating system

    Systematic Design Space Exploration of Dynamic Dataflow Programs for Multi-core Platforms

    Get PDF
    The limitations of clock frequency and power dissipation of deep sub-micron CMOS technology have led to the development of massively parallel computing platforms. They consist of dozens or hundreds of processing units and offer a high degree of parallelism. Taking advantage of that parallelism and transforming it into high program performances requires the usage of appropriate parallel programming models and paradigms. Currently, a common practice is to develop parallel applications using methods evolving directly from sequential programming models. However, they lack the abstractions to properly express the concurrency of the processes. An alternative approach is to implement dataflow applications, where the algorithms are described in terms of streams and operators thus their parallelism is directly exposed. Since algorithms are described in an abstract way, they can be easily ported to different types of platforms. Several dataflow models of computation (MoCs) have been formalized so far. They differ in terms of their expressiveness (ability to handle dynamic behavior) and complexity of analysis. So far, most of the research efforts have focused on the simpler cases of static dataflow MoCs, where many analyses are possible at compile-time and several optimization problems are greatly simplified. At the same time, for the most expressive and the most difficult to analyze dynamic dataflow (DDF), there is still a dearth of tools supporting a systematic and automated analysis minimizing the programming efforts of the designer. The objective of this Thesis is to provide a complete framework to analyze, evaluate and refactor DDF applications expressed using the RVC-CAL language. The methodology relies on a systematic design space exploration (DSE) examining different design alternatives in order to optimize the chosen objective function while satisfying the constraints. The research contributions start from a rigorous DSE problem formulation. This provides a basis for the definition of a complete and novel analysis methodology enabling systematic performance improvements of DDF applications. Different stages of the methodology include exploration heuristics, performance estimation and identification of refactoring directions. All of the stages are implemented as appropriate software tools. The contributions are substantiated by several experiments performed with complex dynamic applications on different types of physical platforms

    Kodizajn arhitekture i algoritama za lokalizacijumobilnih robota i detekciju prepreka baziranih namodelu

    No full text
    This thesis proposes SoPC (System on a Programmable Chip) architectures for efficient embedding of vison-based localization and obstacle detection tasks in a navigational pipeline on autonomous mobile robots. The obtained results are equivalent or better in comparison to state-ofthe- art. For localization, an efficient hardware architecture that supports EKF-SLAM's local map management with seven-dimensional landmarks in real time is developed. For obstacle detection a novel method of object recognition is proposed - detection by identification framework based on single detection window scale. This framework allows adequate algorithmic precision and execution speeds on embedded hardware platforms.Ova teza bavi se dizajnom SoPC (engl. System on a Programmable Chip) arhitektura i algoritama za efikasnu implementaciju zadataka lokalizacije i detekcije prepreka baziranih na viziji u kontekstu autonomne robotske navigacije. Za lokalizaciju, razvijena je efikasna računarska arhitektura za EKF-SLAM algoritam, koja podržava skladištenje i obradu sedmodimenzionalnih orijentira lokalne mape u realnom vremenu. Za detekciju prepreka je predložena nova metoda prepoznavanja objekata u slici putem prozora detekcije fiksne dimenzije, koja omogućava veću brzinu izvršavanja algoritma detekcije na namenskim računarskim platformama

    Ravenscar cross compiler for the Gurkh Project

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (leaves 54-56).Concurrency has greatly simplified the design of embedded software, but the gain in design simplicity is offset by the complexity of system implementation. The Ravenscar profile of Ada95 defines safe tasking constructs that enable the use of deterministic concurrency. The translation of these high-level constructs by the compiler to deterministic object code is dependent on both the underlying operating system and the system operation platform. The commonly available open-source development tools for compiling Ravenscar compliant Ada95 assume that the operating system is implemented as software. A hardware implemented run-time kernel requires a radical rethink of the execution architecture because operating system calls have to be routed from the host processor running the tasks to the hardware implemented kernel RavenHaRT. The redesigned compiler pGNAT is based on the open-source GNAT compiler and uses the GCC back end to cross compile application code to PowerPC object code. The GNAT run-time library (GNARL) is modified to support the use of RavenHaRT. This thesis presents the technical challenges faced and the modifications carried out for generating RavenHaRT compatible, Ravenscar compliant object code.by Pee Seeumpornroj.M.Eng

    Safe and Precise WCET Determination by Abstract Interpretation of Pipeline Models

    Get PDF
    Failure of computer software in a hard real-time system leads to severe consequences and must be avoided by proving the correctness of the systems software. A prerequisite for this is the determination of an upper bound for the worst-case execution times (WCET) of the tasks in the system. We show that for modern CPUs, WCETs can be obtained by static program analysis methods even for CPUs with execution history sensitives components like caches and pipelines. This is the first time that complex CPU features (out-of-order execution, speculation, etc) have been included in a comprehensive and safe analysis. The approach presented in this thesis is able to handle the analysis of very complex architectures (PowerPC 755) by first modeling the CPU and peripherals of the system and then using abstractions on some components of the system to obtain an analysis. The analysis computes WCET for the basic blocks of the program by simulating the abstract system model. The correctness of the approach is shown. A tool has been built based on this approach, which was evaluated under reallife industry conditions by Airbus France in the course of the DAEDALUS project, showing the practical applicability of the methodology.Fehlverhalten der Computersoftware eines harten Echtzeitsystems kann katastrophale Folgen haben. Um ein solches Verhalten zu verhindern, muss die Korrektheit der Programme des Systems vorher nachgewiesen werden. Eine Voraussetzung hierf®ur ist die Kenntniss von oberen Schranken f®ur die Ausf®uhrungszeit der Programme (WCET). F®ur moderne CPUs k®onnen solche Schranken effektiv nur durch statische Analysemethoden verl®asslich gewonnen werden, da die Laufzeiten stark von kontextsensitiven Komponenten (Caches, Pipelines) abh®angen. Bisher galten komplexe Merkmale moderner CPUs (out-of-order Ausf®uhrung, Spekulation) als nicht efzient statisch analysierbar. Die vorliegende Arbeit pr®asentiert einen Ansatz, der in der Lage ist, sehr komplexe Architekturen (etwa den PowerPC 755) zu behandeln. Hierbei wird zuerst ein Modell des Prozessors und der Peripherie des Systems erstellt, dessen Komponenten dann geeignet abstrahiert werden k®onnen, um eine Analyse zu erhalten. Die Analyse berechnet WCET f®ur die Basisbl®ocke eines Programmes durch Simulation des abstrahierten Prozessormodells. Die Korrektheit der Analyse wird durch die Verwendung der Theorie der abstrakten Interpretation garantiert. Mit diesem Ansatz wurde ein Werkzeug entwickelt, welches unter Industriebedingungen von Airbus France im Verlauf des DAEDALUS Projektes evaluiert wurde. Dabei konnte die praktische Anwendbarkeit des vorgestellten Ansatzes klar demonstriert werden

    A Computer Network: Structure and Protocols of the RPCNET

    Get PDF
    A brief description of the RPCNET architecture is given in the beginning of this manual, then, the protocols and the packet formats of RPCNET are described. More precisely, Chapter 1 deals with a general description of the RPCNET, Chapter 2 deals with the 1st level protocol (Line and Reconfiguration protocols), Chapter 3 deals with the 2nd level protocol (End-to-End protocol), and Chapter 4 deals with User-Level protocols, including the description of RNAM, the generalized Access Method supplied by RPCNET. Appendix A is the hardware scheme of the BSC-modified line connection, Appendix B gives an example of how the reconfiguration protocol works, and finally Appendix C describes the packet formats
    corecore