44 research outputs found

    An accurate analysis for guaranteed performance of multiprocessor streaming applications

    Get PDF
    Already for more than a decade, consumer electronic devices have been available for entertainment, educational, or telecommunication tasks based on multimedia streaming applications, i.e., applications that process streams of audio and video samples in digital form. Multimedia capabilities are expected to become more and more commonplace in portable devices. This leads to challenges with respect to cost efficiency and quality. This thesis contributes models and analysis techniques for improving the cost efficiency, and therefore also the quality, of multimedia devices. Portable consumer electronic devices should feature flexible functionality on the one hand and low power consumption on the other hand. Those two requirements are conflicting. Therefore, we focus on a class of hardware that represents a good trade-off between those two requirements, namely on domain-specific multiprocessor systems-on-chip (MP-SoC). Our research work contributes to dynamic (i.e., run-time) optimization of MP-SoC system metrics. The central question in this area is how to ensure that real-time constraints are satisfied and the metric of interest such as perceived multimedia quality or power consumption is optimized. In these cases, we speak of quality-of-service (QoS) and power management, respectively. In this thesis, we pursue real-time constraint satisfaction that is guaranteed by the system by construction and proven mainly based on analytical reasoning. That approach is often taken in real-time systems to ensure reliable performance. Therefore the performance analysis has to be conservative, i.e. it has to use pessimistic assumptions on the unknown conditions that can negatively influence the system performance. We adopt this hypothesis as the foundation of this work. Therefore, the subject of this thesis is the analysis of guaranteed performance for multimedia applications running on multiprocessors. It is very important to note that our conservative approach is essentially different from considering only the worst-case state of the system. Unlike the worst-case approach, our approach is dynamic, i.e. it makes use of run-time characteristics of the input data and the environment of the application. The main purpose of our performance analysis method is to guide the run-time optimization. Typically, a resource or quality manager predicts the execution time, i.e., the time it takes the system to process a certain number of input data samples. When the execution times get smaller, due to dependency of the execution time on the input data, the manager can switch the control parameter for the metric of interest such that the metric improves but the system gets slower. For power optimization, that means switching to a low-power mode. If execution times grow, the manager can set parameters so that the system gets faster. For QoS management, for example, the application can be switched to a different quality mode with some degradation in perceived quality. The real-time constraints are then never violated and the metrics of interest are kept as good as possible. Unfortunately, maintaining system metrics such as power and quality at the optimal level contradicts with our main requirement, i.e., providing performance guarantees, because for this one has to give up some quality or power consumption. Therefore, the performance analysis approach developed in this thesis is not only conservative, but also accurate, so that the optimization of the metric of interest does not suffer too much from conservativity. This is not trivial to realize when two factors are combined: parallel execution on multiple processors and dynamic variation of the data-dependent execution delays. We achieve the goal of conservative and accurate performance estimation for an important class of multiprocessor platforms and multimedia applications. Our performance analysis technique is realizable in practice in QoS or power management setups. We consider a generic MP-SoC platform that runs a dynamic set of applications, each application possibly using multiple processors. We assume that the applications are independent, although it is possible to relax this requirement in the future. To support real-time constraints, we require that the platform can provide guaranteed computation, communication and memory budgets for applications. Following important trends in system-on-chip communication, we support both global buses and networks-on-chip. We represent every application as a homogeneous synchronous dataflow (HSDF) graph, where the application tasks are modeled as graph nodes, called actors. We allow dynamic datadependent actor execution delays, which makes HSDF graphs very useful to express modern streaming applications. Our reason to consider HSDF graphs is that they provide a good basic foundation for analytical performance estimation. In this setup, this thesis provides three major contributions: 1. Given an application mapped to an MP-SoC platform, given the performance guarantees for the individual computation units (the processors) and the communication unit (the network-on-chip), and given constant actor execution delays, we derive the throughput and the execution time of the system as a whole. 2. Given a mapped application and platform performance guarantees as in the previous item, we extend our approach for constant actor execution delays to dynamic datadependent actor delays. 3. We propose a global implementation trajectory that starts from the application specification and goes through design-time and run-time phases. It uses an extension of the HSDF model of computation to reflect the design decisions made along the trajectory. We present our model and trajectory not only to put the first two contributions into the right context, but also to present our vision on different parts of the trajectory, to make a complete and consistent story. Our first contribution uses the idea of so-called IPC (inter-processor communication) graphs known from the literature, whereby a single model of computation (i.e., HSDF graphs) are used to model not only the computation units, but also the communication unit (the global bus or the network-on-chip) and the FIFO (first-in-first-out) buffers that form a ‘glue’ between the computation and communication units. We were the first to propose HSDF graph structures for modeling bounded FIFO buffers and guaranteed throughput network connections for the network-on-chip communication in MP-SoCs. As a result, our HSDF models enable the formalization of the on-chip FIFO buffer capacity minimization problem under a throughput constraint as a graph-theoretic problem. Using HSDF graphs to formalize that problem helps to find the performance bottlenecks in a given solution to this problem and to improve this solution. To demonstrate this, we use the JPEG decoder application case study. Also, we show that, assuming constant – worst-case for the given JPEG image – actor delays, we can predict execution times of JPEG decoding on two processors with an accuracy of 21%. Our second contribution is based on an extension of the scenario approach. This approach is based on the observation that the dynamic behavior of an application is typically composed of a limited number of sub-behaviors, i.e., scenarios, that have similar resource requirements, i.e., similar actor execution delays in the context of this thesis. The previous work on scenarios treats only single-processor applications or multiprocessor applications that do not exploit all the flexibility of the HSDF model of computation. We develop new scenario-based techniques in the context of HSDF graphs, to derive the timing overlap between different scenarios, which is very important to achieve good accuracy for general HSDF graphs executing on multiprocessors. We exploit this idea in an application case study – the MPEG-4 arbitrarily-shaped video decoder, and demonstrate execution time prediction with an average accuracy of 11%. To the best of our knowledge, for the given setup, no other existing performance technique can provide a comparable accuracy and at the same time performance guarantees

    Predicting software performance in symmetric multi-core and multiprocessor Environments

    Get PDF
    With today\u27s rise of multi-core processors, concurrency becomes a ubiquitous challenge in software development.Performance prediction methods have to reflect the influence of multiprocessing environments on software performance in order to help software architects to find potential performance problems during early development phases. In this thesis, we address the influence of the operating system scheduler on software performance in symmetric multiprocessing environments

    Effective Compile-Time Analysis for Data Prefetching In Java

    Get PDF
    The memory hierarchy in modern architectures continues to be a major performance bottleneck. Many existing techniques for improving memory performance focus on Fortran and C programs, but memory latency is also a barrier to achieving high performance in object-oriented languages. Existing software techniques are inadequate for exposing optimization opportunities in object-oriented programs. One key problem is the use of high-level programming abstractions which make analysis difficult. Another challenge is that programmers use a variety of data structures, including arrays and linked structures, so optimizations must work on a broad range of programs. We develop a new unified data-flow analysis for identifying accesses to arrays and linked structures called recurrence analysis. Prior approaches that identify these access patterns are ad hoc, or treat arrays and linked structures independently. The data-flow analysis is intra- and inter-procedural, which is important in Java programs that use encapsulation to hide implementation details. We sho

    Management, Technology and Learning for Individuals, Organisations and Society in Turbulent Environments

    Get PDF
    This book presents the collection of fifty papers which were presented in the Second International Conference on BUSINESS SUSTAINABILITY 2011 - Management, Technology and Learning for Individuals, Organisations and Society in Turbulent Environments , held in Póvoa de Varzim, Portugal, from 22ndto 24thof June, 2011.The main motive of the meeting was growing awareness of the importance of the sustainability issue. This importance had emerged from the growing uncertainty of the market behaviour that leads to the characterization of the market, i.e. environment, as turbulent. Actually, the characterization of the environment as uncertain and turbulent reflects the fact that the traditional technocratic and/or socio-technical approaches cannot effectively and efficiently lead with the present situation. In other words, the rise of the sustainability issue means the quest for new instruments to deal with uncertainty and/or turbulence. The sustainability issue has a complex nature and solutions are sought in a wide range of domains and instruments to achieve and manage it. The domains range from environmental sustainability (referring to natural environment) through organisational and business sustainability towards social sustainability. Concerning the instruments for sustainability, they range from traditional engineering and management methodologies towards “soft” instruments such as knowledge, learning, and creativity. The papers in this book address virtually whole sustainability problems space in a greater or lesser extent. However, although the uncertainty and/or turbulence, or in other words the dynamic properties, come from coupling of management, technology, learning, individuals, organisations and society, meaning that everything is at the same time effect and cause, we wanted to put the emphasis on business with the intention to address primarily companies and their businesses. Due to this reason, the main title of the book is “Business Sustainability 2.0” but with the approach of coupling Management, Technology and Learning for individuals, organisations and society in Turbulent Environments. Also, the notation“2.0” is to promote the publication as a step further from our previous publication – “Business Sustainability I” – as would be for a new version of software. Concerning the Second International Conference on BUSINESS SUSTAINABILITY, its particularity was that it had served primarily as a learning environment in which the papers published in this book were the ground for further individual and collective growth in understanding and perception of sustainability and capacity for building new instruments for business sustainability. In that respect, the methodology of the conference work was basically dialogical, meaning promoting dialog on the papers, but also including formal paper presentations. In this way, the conference presented a rich space for satisfying different authors’ and participants’ needs. Additionally, promoting the widest and global learning environment and participation, in accordance with the Conference's assumed mission to promote Proactive Generative Collaborative Learning, the Conference Organisation shares/puts open to the community the papers presented in this book, as well as the papers presented on the previous Conference(s). These papers can be accessed from the conference webpage (http://labve.dps.uminho.pt/bs11). In these terms, this book could also be understood as a complementary instrument to the Conference authors’ and participants’, but also to the wider readerships’ interested in the sustainability issues. The book brought together 107 authors from 11 countries, namely from Australia, Belgium, Brazil, Canada, France, Germany, Italy, Portugal, Serbia, Switzerland, and United States of America. The authors “ranged” from senior and renowned scientists to young researchers providing a rich and learning environment. At the end, the editors hope, and would like, that this book to be useful, meeting the expectation of the authors and wider readership and serving for enhancing the individual and collective learning, and to incentive further scientific development and creation of new papers. Also, the editors would use this opportunity to announce the intention to continue with new editions of the conference and subsequent editions of accompanying books on the subject of BUSINESS SUSTAINABILITY, the third of which is planned for year 2013.info:eu-repo/semantics/publishedVersio

    Evolving an efficient and effective off-the-shelf computing infrastructure for schools in rural areas of South Africa

    Get PDF
    Upliftment of rural areas and poverty alleviation are priorities for development in South Africa. Information and knowledge are key strategic resources for social and economic development and ICTs act as tools to support them, enabling innovative and more cost effective approaches. In order for ICT interventions to be possible, infrastructure has to be deployed. For the deployment to be effective and sustainable, the local community needs to be involved in shaping and supporting it. This study describes the technical work done in the Siyakhula Living Lab (SLL), a long-term ICT4D experiment in the Mbashe Municipality, with a focus on the deployment of ICT infrastructure in schools, for teaching and learning but also for use by the communities surrounding the schools. As a result of this work, computing infrastructure was deployed, in various phases, in 17 schools in the area and a “broadband island” connecting them was created. The dissertation reports on the initial deployment phases, discussing theoretical underpinnings and policies for using technology in education as well various computing and networking technologies and associated policies available and appropriate for use in rural South African schools. This information forms the backdrop of a survey conducted with teachers from six schools in the SLL, together with experimental work towards the provision of an evolved, efficient and effective off-the-shelf computing infrastructure in selected schools, in order to attempt to address the shortcomings of the computing infrastructure deployed initially in the SLL. The result of the study is the proposal of an evolved computing infrastructure model for use in rural South African schools

    NASA Tech Briefs, October 1990

    Get PDF
    Topics: New Product Ideas; NASA TU Services; Electronic Components and Circuits; Electronic Systems; Physical' Sciences; Materials; Computer Programs; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences; Life Sciences

    Aspects of Code Generation and Data Transfer Techniques for Modern Parallel Architectures

    Get PDF
    Im Bereich der Prozessorarchitekturen hat sich der Fokus neuer Entwicklungen von immer höheren Taktfrequenzen hin zu immer mehr Kernen auf einem Chip verschoben. Eine hohe Kernanzahl ermöglicht es unterschiedlich leistungsfĂ€hige Kerne anzubieten, und sogar dedizierte Kerne mit speziellen BefehlssĂ€tzen. Die Entwicklung fĂŒr solch heterogene Plattformen ist herausfordernd und benötigt entsprechende UnterstĂŒtzung von Entwicklungswerkzeugen, wie beispielsweise Übersetzern. Neben ihrer heterogenen Kernstruktur gibt es eine zweite Dimension, die die Entwicklung fĂŒr solche Architekturen anspruchsvoll macht: ihre Speicherstruktur. Die Aufrechterhaltung von globaler Cache-KohĂ€renz erschwert das Erreichen hoher Kernzahlen. Hardwarebasierte Cache-KohĂ€renz-Protokolle skalieren entweder schlecht, oder sind kompliziert und fĂŒhren zu Problemen bei AusfĂŒhrungszeit und Energieeffizienz. Eine radikale Lösung dieses Problems stellt die Abschaffung der globalen Cache-KohĂ€renz dar. Jedoch ist es schwierig, bestehende Programmiermodelle effizient auf solch eine Hardware-Architektur mit schwachen Garantien abzubilden. Der erste Teil dieser Dissertation beschĂ€ftigt sich Datentransfertechniken fĂŒr nicht-cache-kohĂ€rente Architekturen mit gemeinsamem Speicher. Diese Architekturen bieten einen gemeinsamen physikalischen Adressraum, implementieren aber keine hardwarebasierte KohĂ€renz zwischen allen Caches des Systems. Die logische Partitionierung des gemeinsamen Speichers ermöglicht die sichere Programmierung einer solchen Plattform. Im Allgemeinen erzeugt dies die Notwendigkeit Daten zwischen Speicherpartitionen zu kopieren. Wir untersuchen die Übersetzung fĂŒr invasive Architekturen, einer Familie von nicht-cache-kohĂ€renten Vielkernarchitekturen. Wir betrachten die effiziente Implementierung von Datentransfers sowohl einfacher als auch komplexer Datenstrukturen auf invasiven Architekturen. Insbesondere schlagen wir eine neuartige Technik zum Kopieren komplexer verzeigerter Datenstrukturen vor, die ohne Serialisierung auskommt. Hierzu verallgemeinern wir den Objekt-Klon-Ansatz mit ĂŒbersetzergesteuerter automatischer software-basierter KohĂ€renz, sodass er auch im Kontext nicht-kohĂ€renter Caches funktioniert. Wir prĂ€sentieren Implementierungen mehrerer Datentransfertechniken im Rahmen eines existierenden Übersetzers und seines Laufzeitsystems. Wir fĂŒhren eine ausfĂŒhrliche Auswertung dieser Implementierungen auf einem FPGA-basierten Prototypen einer invasiven Architektur durch. Schließlich schlagen wir vor, HardwareunterstĂŒtzung fĂŒr bereichsbasierte Cache-Operationen hinzuzufĂŒgen und beschreiben und bewerten mögliche Implementierungen und deren Kosten. Der zweite Teil dieser Dissertation befasst sich mit der Beschleunigung von Shuffle-Code, der bei der Registerzuteilung auftritt, durch die Verwendung von Permutationsbefehlen. Die Aufgabe der Registerzuteilung wĂ€hrend der ProgrammĂŒbersetzung ist die Abbildung von Programmvariablen auf Maschinenregister. WĂ€hrend der Registerzuteilung erzeugt der Übersetzer Shuffle-Code, der aus Kopier- und Tauschbefehlen besteht, um Werte zwischen Registern zu transferieren. AbhĂ€ngig von der QualitĂ€t der Registerzuteilung und der Zahl der verfĂŒgbaren Register kann eine große Menge an Shuffle-Code erzeugt werden. Wir schlagen vor, die AusfĂŒhrung von Shuffle-Code mit Hilfe von neuartigen Permutationsbefehlen zu beschleunigen, die die Inhalte von einigen Registern in einem Taktzyklus beliebig permutieren. Um die Machbarkeit dieser Idee zu demonstrieren, erweitern wir zunĂ€chst ein bestehendes RISC-Befehlsformat um Permutationsbefehle. Anschließend beschreiben wir, wie die vorgeschlagenen Permutationsbefehle in einer bestehenden RISC-Architektur implementiert werden können. Dann entwickeln wir zwei Verfahren zur Codeerzeugung, die die Permutationsbefehle ausnutzen, um Shuffle-Code zu beschleunigen: eine schnelle Heuristik und einen auf dynamischer Programmierung basierenden optimalen Ansatz. Wir beweisen QualitĂ€ts- und Korrektheitseingeschaften beider AnsĂ€tze und zeigen die OptimalitĂ€t des zweiten Ansatzes. Im Folgenden implementieren wir beide Codeerzeugungsverfahren in einem Übersetzer und untersuchen sowie vergleichen deren CodequalitĂ€t ausfĂŒhrlich mit Hilfe standardisierter Benchmarks. ZunĂ€chst messen wir die genaue Zahl der dynamisch ausgefĂŒhrten Befehle, welche wir folgend validieren, indem wir Programmlaufzeiten auf einer FPGA-basierten Prototypimplementierung der um Permutationsbefehle erweiterten RISC-Architektur messen. Schließlich argumentieren wir, dass Permutationsbefehle auf modernen Out-Of-Order-Prozessorarchitekturen, die bereits Registerumbenennung unterstĂŒtzen, mit wenig Aufwand implementierbar sind

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp

    Recent Advances in Wireless Communications and Networks

    Get PDF
    This book focuses on the current hottest issues from the lowest layers to the upper layers of wireless communication networks and provides "real-time" research progress on these issues. The authors have made every effort to systematically organize the information on these topics to make it easily accessible to readers of any level. This book also maintains the balance between current research results and their theoretical support. In this book, a variety of novel techniques in wireless communications and networks are investigated. The authors attempt to present these topics in detail. Insightful and reader-friendly descriptions are presented to nourish readers of any level, from practicing and knowledgeable communication engineers to beginning or professional researchers. All interested readers can easily find noteworthy materials in much greater detail than in previous publications and in the references cited in these chapters
    corecore