15 research outputs found

    Software-Oriented Data Access Characterization for Chip Multiprocessor Architecture Optimizations

    Get PDF
    The integration of an increasing amount of on-chip hardware in Chip-Multiprocessors (CMPs) poses a challenge of efficiently utilizing the on-chip resources to maximize performance. Prior research proposals largely rely on additional hardware support to achieve desirable tradeoffs. However, these purely hardware-oriented mechanisms typically result in more generic but less efficient approaches. A new trend is designing adaptive systems by exploiting and leveraging application-level information. In this work a wide range of applications are analyzed and remarkable data access behaviors/patterns are recognized to be useful for architectural and system optimizations. In particular, this dissertation work introduces software-based techniques that can be used to extract data access characteristics for cross-layer optimizations on performance and scalability. The collected information is utilized to guide cache data placement, network configuration, coherence operations, address translation, memory configuration, etc. In particular, an approach is proposed to classify data blocks into different categories to optimize an on-chip coherent cache organization. For applications with compile-time deterministic data access localities, a compiler technique is proposed to determine data partitions that guide the last level cache data placement and communication patterns for network configuration. A page-level data classification is also demonstrated to improve address translation performance. The successful utilization of data access characteristics on traditional CMP architectures demonstrates that the proposed approach is promising and generic and can be potentially applied to future CMP architectures with emerging technologies such as the Spin-transfer torque RAM (STT-RAM)

    Optimization and validation of discontinuous Galerkin Code for the 3D Navier-Stokes equations

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2011.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 165-170).From residual and Jacobian assembly to the linear solve, the components of a high-order, Discontinuous Galerkin Finite Element Method (DGFEM) for the Navier-Stokes equations in 3D are presented. Emphasis is given to residual and Jacobian assembly, since these are rarely discussed in the literature; in particular, this thesis focuses on code optimization. Performance properties of DG methods are identified, including key memory bottlenecks. A detailed overview of the memory hierarchy on modern CPUs is given along with discussion on optimization suggestions for utilizing the hierarchy efficiently. Other programming suggestions are also given, including the process for rewriting residual and Jacobian assembly using matrix-matrix products. Finally, a validation of the performance of the 3D, viscous DG solver is presented through a series of canonical test cases.by Eric Hung-Lin Liu.S.M

    MxTasks: a novel processing model to support data processing on modern hardware

    Get PDF
    The hardware landscape has changed rapidly in recent years. Modern hardware in today's servers is characterized by many CPU cores, multiple sockets, and vast amounts of main memory structured in NUMA hierarchies. In order to benefit from these highly parallel systems, the software has to adapt and actively engage with newly available features. However, the processing models forming the foundation for many performance-oriented applications have remained essentially unchanged. Threads, which serve as the central processing abstractions, can be considered a "black box" that hardly allows any transparency between the application and the system underneath. On the one hand, applications are aware of the knowledge that could assist the system in optimizing the execution, such as accessed data objects and access patterns. On the other hand, the limited opportunities for information exchange cause operating systems to make assumptions about the applications' intentions to optimize their execution, e.g., for local data access. Applications, on the contrary, implement optimizations tailored to specific situations, such as sophisticated synchronization mechanisms and hardware-conscious data structures. This work presents MxTasking, a task-based runtime environment that assists the design of data structures and applications for contemporary hardware. MxTasking rethinks the interfaces between performance-oriented applications and the execution substrate, streamlining the information exchange between both layers. By breaking patterns of processing models designed with past generations of hardware in mind, MxTasking creates novel opportunities to manage resources in a hardware- and application-conscious way. Accordingly, we question the granularity of "conventional" threads and show that fine-granular MxTasks are a viable abstraction unit for characterizing and optimizing the execution in a general way. Using various demonstrators in the context of database management systems, we illustrate the practical benefits and explore how challenges like memory access latencies and error-prone synchronization of concurrency can be addressed straightforwardly and effectively

    Timing-predictable memory allocation in hard real-time systems

    Get PDF
    For hard real-time applications, tight provable bounds on the application\u27s worst-case execution time must be derivable. Employing dynamic memory allocation, in general, significantly decreases an application\u27s timing predictability. In consequence, current hard real-time applications rely on static memory management. This thesis studies how the predictability issues of dynamic memory allocation can be overcome and dynamic memory allocation be enabled for hard real-time applications. We give a detailed analysis of the predictability challenges imposed on current state-of-the-art timing analyses by dynamic memory allocation. We propose two approaches to overcome these issues and enable dynamic memory allocation for hard real-time systems: automatically transforming dynamic into static allocation and using a novel, cache-aware and predictable memory allocator. Statically transforming dynamic into static memory allocation allows for very precise WCET bounds as all accessed memory addresses are completely known. However, this approach requires much information about the application\u27s allocation behavior to be available statically. For programs where a static precomputation of a suitable allocation scheme is not applicable, we investigate approaches to construct predictable dynamic memory allocators to replace the standard, general-purpose allocators in real-time applications. We present evaluations of the proposed approaches to evidence their practical applicability.Harte Echtzeitsysteme bedingen beweisbare obere Schranken bezüglich ihrer maximalen Laufzeit. Die Verwendung dynamischer Speicherverwaltung (DSV) innerhalb eine Anwendung verschlechtert deren Zeitvorhersagbarkeit im Allgemeinen erheblich. Folglich findet sich derzeit lediglich statische Speicherverwaltung in solchen Systemen. Diese Arbeit untersucht Wege, Probleme bezüglich der Vorhersagbar von Anwendungen, die aus dem Einsatz einer DSV resultieren, zu überbrücken. Aufbauend auf einer Analyse der Probleme, denen sich Zeitanalysen durch DSV konfrontiert sehen, erarbeiten wir zwei Lösungsansätze. Unser erster Ansatz verfolgt eine automatische Transformation einer gegebenen DSV in eine statische Verwaltung. Dieser Ansatz erfordert hinreichend genaue Information über Speicheranforderungen der Anwendung sowie die Lebenszyklen der angeforderten Speicherblöcke. Hinsichtlich Anwendungen, bei denen dieser erste Ansatz nicht anwendbar ist, untersuchen wir neuartige Algorithmen zur Implementierung vorhersagbarer Verfahren zur dynamischen Speicherverwaltung. Auf diesen Algorithmen basierende Speicherverwalter können die für Echtzeitsysteme ungeeigneten, allgemeinen Speicherverwalter bei Bedarf ersetzen. Wir belegen weiter die praktische Anwendbarkeit der von uns vorgeschlagenen Verfahren

    A Survey of Techniques for Architecting TLBs

    Get PDF
    “Translation lookaside buffer” (TLB) caches virtual to physical address translation information and is used in systems ranging from embedded devices to high-end servers. Since TLB is accessed very frequently and a TLB miss is extremely costly, prudent management of TLB is important for improving performance and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and distinctions. We believe that this paper will be useful for chip designers, computer architects and system engineers

    Performance Optimization Strategies for Transactional Memory Applications

    Get PDF
    This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications

    Diseño de una ciudad inteligente para redes vehiculares

    Get PDF
    English: road safety has become a main issue for governments and car manufacturers in the last twenty years. The development of new vehicular technologies has favored companies, researchers and institutions to focus their efforts on improving road safety. During the last decades, the evolution of wireless technologies has allowed researchers to design communication systems where vehicles participate in the communication networks. Thus, the concept of Intelligent Transportation Systems (ITS) appeared. This concept is used when talking about communication technologies between vehicles and infrastructure that improve transport safety, its management, environmental performance, etc. Due to the high economic cost of real-life tests and experimentation, the use of simulators becomes really useful when developing ITS. Nonetheless, simulators not always include all the capabilities needed to simulate these kinds of networks. Thus, in this project the NCTUns simulator is modified in order to add new capabilities that allow users simulate ITS. Furthermore, smart city scenarios are simulated in order to evaluate how the use of these networks allows real-time statistic collection and calculation, and how modifications made in NCTUns work.Castellano: la seguridad en la carretera se ha convertido en un problema principal para gobiernos y fabricantes de automóviles en los últimos años. El desarrollo de nuevas tecnologías vehiculares ha permitido a compañías, investigadores e instituciones a centrar sus esfuerzos para mejorar la seguridad vial. Durante las últimas décadas, la evolución de la tecnología de comunicación inalámbrica ha permitido a investigadores el diseño de sistemas de comunicación en los cuales los vehículos forman parte de la red de comunicación. De esta forma, se creó el concepto de Sistema de Transporte Inteligente (STI), concepto utilizado al hablar sobre las tecnologías de comunicación entre vehículos e infraestructura, que mejoran la seguridad vial en el transporte, su mejor gestión, eficiencia medioambiental, etc. Debido al alto coste económico de probar STI en situaciones reales, el uso de simuladores es realmente útil a la hora de desarrollar este tipo de sistemas. Así, en este proyecto el simulador NCTUns ha sido modificado con el objetivo de añadir nuevas posibilidades al simulador que ayuden a diseñar STI. Además, un escenario de una ciudad inteligente ha sido simulado con el objetivo de evaluar como el uso de estas redes permite la recolección y el cálculo de estadísticas en tiempo real, además de comprobar cómo funcionan los cambios realizados en el simulador.Català: la seguretat a la carretera ha esdevingut un problema principal pels governs i pels fabricants d'automòbils en els últims anys. El desenvolupament de noves tecnologies de vehicles ha afavorit a les empreses, els investigadors i les institucions que centrin els seus esforços a millorar la seguretat viària. Durant les últimes dècades, l'evolució de les tecnologies sense fils ha permès als investigadors a dissenyar sistemes de comunicació on els vehicles poden participar en les xarxes de comunicació. D'aquesta manera, es crea el concepte de Sistema de Transport Intel·ligent (STI), concepte utilitzat en parlar sobre les tecnologies de comunicació entre vehicles i infraestructura que milloren la seguretat vial en el transport, la seva millor gestió, l'eficiència mediambiental, etc. A causa de l'alt cost econòmic de provar STI en situacions reals, l'ús de simuladors és realment útil a l'hora de desenvolupar STI. Així, en aquest projecte el simulador NCTUns ha estat modificat amb l'objectiu d'afegir noves possibilitats al simulador que ajudin a dissenyar STI a futurs usuaris. A més, un escenari d'una ciutat intel·ligent ha estat simulat amb l'objectiu d'avaluar com l'ús de la xarxa permet la recol·lecció i el càlcul d'estadístiques en temps real, a més de comprovar com funcionen els canvis realitzats en el simulador

    A hierarchical group model for programming sensor networks

    Get PDF
    A hierarchical group model that decouples computation from hardware can characterize and aid in the construction of sensor network software with minimal overhead. Future sensor network applications will move beyond static, homogeneous deployments to include dynamic, heterogeneous elements. These sensor networks will also gain new users, including casual users who will expect intuitive interfaces to interact with sensor networks. To address these challenges, a new computational model and a system implementing the model are presented. This model ensures that computations can be readily reassigned as sensor nodes are introduced or removed. The model includes methods for communication to accommodate these dynamic elements. This dissertation presents a detailed description and design of a computational model that resolves these challenges using a hierarchical group mechanism. In this model, computation is tasked to logical groups and split into collective and local components that communicate hierarchically. Local computation is primarily used for data production and publishes data to the collective computation. Similarly, collective computation is primarily used for data aggregation and pushes results back to the local computation. Finally, the model includes data-processing functions interposed between local and collective functions that are responsible for data conversion. This dissertation also presents implementations and applications of the model. Implementations include Kensho, a C-based implementation of the hierarchical group model, that can be used for a variety of user applications. Another implementation, Tables, presents a spreadsheet-inspired view of the sensor network that takes advantage of hierarchical groups for both computation and communication. Users are able to specify both local and collective functions that execute on the sensor network via the spreadsheet interface. Applications of the model are also explored. One application, FUSN, provides a set of methods for constructing filesystem-based interfaces for sensor networks. This demonstrates the general applicability of the model as applied to sensor network programming and management interfaces. Finally, the model is applied to a novel privacy algorithm to demonstrate that the model isn\u27t strictly limited to programming interfaces

    Managing Smartphone Testbeds with SmartLab

    Get PDF
    The explosive number of smartphones with ever growing sensing and computing capabilities have brought a paradigm shift to many traditional domains of the computing field. Re-programming smartphones and instrumenting them for application testing and data gathering at scale is currently a tedious and time-consuming process that poses significant logistical challenges. In this paper, we make three major contributions: First, we propose a comprehensive architecture, coined SmartLab1, for managing a cluster of both real and virtual smartphones that are either wired to a private cloud or connected over a wireless link. Second, we propose and describe a number of Android management optimizations (e.g., command pipelining, screen-capturing, file management), which can be useful to the community for building similar functionality into their systems. Third, we conduct extensive experiments and microbenchmarks to support our design choices providing qualitative evidence on the expected performance of each module comprising our architecture. This paper also overviews experiences of using SmartLab in a research-oriented setting and also ongoing and future development efforts
    corecore