113 research outputs found

    Parallel architectures and runtime systems co-design for task-based programming models

    Get PDF
    The increasing parallelism levels in modern computing systems has extolled the need for a holistic vision when designing multiprocessor architectures taking in account the needs of the programming models and applications. Nowadays, system design consists of several layers on top of each other from the architecture up to the application software. Although this design allows to do a separation of concerns where it is possible to independently change layers due to a well-known interface between them, it is hampering future systems design as the Law of Moore reaches to an end. Current performance improvements on computer architecture are driven by the shrinkage of the transistor channel width, allowing faster and more power efficient chips to be made. However, technology is reaching physical limitations were the transistor size will not be able to be reduced furthermore and requires a change of paradigm in systems design. This thesis proposes to break this layered design, and advocates for a system where the architecture and the programming model runtime system are able to exchange information towards a common goal, improve performance and reduce power consumption. By making the architecture aware of runtime information such as a Task Dependency Graph (TDG) in the case of dataflow task-based programming models, it is possible to improve power consumption by exploiting the critical path of the graph. Moreover, the architecture can provide hardware support to create such a graph in order to reduce the runtime overheads and making possible the execution of fine-grained tasks to increase the available parallelism. Finally, the current status of inter-node communication primitives can be exposed to the runtime system in order to perform a more efficient communication scheduling, and also creates new opportunities of computation and communication overlap that were not possible before. An evaluation of the proposals introduced in this thesis is provided and a methodology to simulate and characterize the application behavior is also presented.El aumento del paralelismo proporcionado por los sistemas de cómputo modernos ha provocado la necesidad de una visión holística en el diseño de arquitecturas multiprocesador que tome en cuenta las necesidades de los modelos de programación y las aplicaciones. Hoy en día el diseño de los computadores consiste en diferentes capas de abstracción con una interfaz bien definida entre ellas. Las limitaciones de esta aproximación junto con el fin de la ley de Moore limitan el potencial de los futuros computadores. La mayoría de las mejoras actuales en el diseño de los computadores provienen fundamentalmente de la reducción del tamaño del canal del transistor, lo cual permite chips más rápidos y con un consumo eficiente sin apenas cambios fundamentales en el diseño de la arquitectura. Sin embargo, la tecnología actual está alcanzando limitaciones físicas donde no será posible reducir el tamaño de los transistores motivando así un cambio de paradigma en la construcción de los computadores. Esta tesis propone romper este diseño en capas y abogar por un sistema donde la arquitectura y el sistema de tiempo de ejecución del modelo de programación sean capaces de intercambiar información para alcanzar una meta común: La mejora del rendimiento y la reducción del consumo energético. Haciendo que la arquitectura sea consciente de la información disponible en el modelo de programación, como puede ser el grafo de dependencias entre tareas en los modelos de programación dataflow, es posible reducir el consumo energético explotando el camino critico del grafo. Además, la arquitectura puede proveer de soporte hardware para crear este grafo con el objetivo de reducir el overhead de construir este grado cuando la granularidad de las tareas es demasiado fina. Finalmente, el estado de las comunicaciones entre nodos puede ser expuesto al sistema de tiempo de ejecución para realizar una mejor planificación de las comunicaciones y creando nuevas oportunidades de solapamiento entre cómputo y comunicación que no eran posibles anteriormente. Esta tesis aporta una evaluación de todas estas propuestas, así como una metodología para simular y caracterizar el comportamiento de las aplicacionesPostprint (published version

    Reducing synchronization in distributed parallel programs

    Get PDF
    Developers of scalable libraries and applications for distributed-memory parallel systems face many challenges to attaining high performance. These challenges include communication latency, critical path delay, suboptimal scheduling, load imbalance, and system noise. These challenges are often defined and measured relative to points of broad synchronization in the program’s execution. Given the way in which many algorithms are defined and systems are implemented, gauging the above challenges at synchronization points is not unreasonable. In this thesis, I attempt to demonstrate that in many cases, those synchronization points are themselves the core issue behind these challenges. In some cases, the synchronizing operations cause a program to incur the costs from these challenges. In other cases, the presence of synchronization potentially exacerbates these problems. Through a simple performance model, I demonstrate that making synchronization less frequent can greatly mitigate performance issues. My work and several results in the literature show that many motifs and whole applications can be successfully redesigned to operate with asymptotically less synchronization than their naïve starting points. In exploring these issues, I have identified recurrent patterns across many applications and multiple environments that can guide future efforts more directly toward synchronization-avoiding designs. Thus, I attempt to offer developers the beginnings of a high-level play-book to follow rather than having to rediscover application-specific instances of the patterns

    Exploring Fog of War Concepts in Wargame Scenarios

    Get PDF
    This thesis explores fog of war concepts through three submitted journal articles. The Department of Defense and U.S. Air Force are attempting to analyze war scenarios to aid the decision-making process; fog modeling improves realism in these wargame scenarios. The first article Navigating an Enemy Contested Area with a Parallel Search Algorithm [1] investigates a parallel algorithm\u27s speedup, compared to the sequential implementation, with varying map configurations in a tile-based wargame. The parallel speedup tends to exceed 50 but in certain situations. The sequential algorithm outperforms it depending on the configuration of enemy location and amount on the map. The second article Modeling Fog of War Effects in AFSIM [2] introduces the FAT for the AFSIM to introduce and manipulate fog in wargame scenarios. FAT integrates into AFSIM version 2.7.0 and scenario results verify the tool\u27s fog effects for positioning error, hits, and probability affect the success rate. The third article Applying Fog Analysis Tool to AFSIM Multi-Domain CLASS scenarios [3] furthers the verification of FAT to introduce fog across all war fighting domains using a set of CLASS scenarios. The success rate trends with fog impact for each domain scenario support FAT\u27s effectiveness in disrupting the decision-making process for multi-domain operations. The three articles demonstrate fog can affect search, tasking, and decision-making processes for various types of wargame scenarios. The capabilities introduced in this thesis support wargame analysts to improve decision-making in AFSIM military scenarios

    Methods and Applications of Synthetic Data Generation

    Get PDF
    The advent of data mining and machine learning has highlighted the value of large and varied sources of data, while increasing the demand for synthetic data captures the structural and statistical characteristics of the original data without revealing personal or proprietary information contained in the original dataset. In this dissertation, we use examples from original research to show that, using appropriate models and input parameters, synthetic data that mimics the characteristics of real data can be generated with sufficient rate and quality to address the volume, structural complexity, and statistical variation requirements of research and development of digital information processing systems. First, we present a progression of research studies using a variety of tools to generate synthetic network traffic patterns, enabling us to observe relationships between network latency and communication pattern benchmarks at all levels of the network stack. We then present a framework for synthesizing large scale IoT data with complex structural characteristics in a scalable extraction and synthesis framework, and demonstrate the use of generated data in the benchmarking of IoT middleware. Finally, we detail research on synthetic image generation for deep learning models using 3D modeling. We find that synthetic images can be an effective technique for augmenting limited sets of real training data, and in use cases that benefit from incremental training or model specialization, we find that pretraining on synthetic images provided a usable base model for transfer learning

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Extensible Performance-Aware Runtime Integrity Measurement

    Get PDF
    Today\u27s interconnected world consists of a broad set of online activities including banking, shopping, managing health records, and social media while relying heavily on servers to manage extensive sets of data. However, stealthy rootkit attacks on this infrastructure have placed these servers at risk. Security researchers have proposed using an existing x86 CPU mode called System Management Mode (SMM) to search for rootkits from a hardware-protected, isolated, and privileged location. SMM has broad visibility into operating system resources including memory regions and CPU registers. However, the use of SMM for runtime integrity measurement mechanisms (SMM-RIMMs) would significantly expand the amount of CPU time spent away from operating system and hypervisor (host software) control, resulting in potentially serious system impacts. To be a candidate for production use, SMM RIMMs would need to be resilient, performant and extensible. We developed the EPA-RIMM architecture guided by the principles of extensibility, performance awareness, and effectiveness. EPA-RIMM incorporates a security check description mechanism that allows dynamic changes to the set of resources to be monitored. It minimizes system performance impacts by decomposing security checks into shorter tasks that can be independently scheduled over time. We present a performance methodology for SMM to quantify system impacts, as well as a simulator that allows for the evaluation of different methods of scheduling security inspections. Our SMM-based EPA-RIMM prototype leverages insights from the performance methodology to detect host software rootkits at reduced system impacts. EPA-RIMM demonstrates that SMM-based rootkit detection can be made performance-efficient and effective, providing a new tool for defense

    Investigation of spacecraft cluster autonomy through an acoustic imaging interferometric testbed

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 1999.Includes bibliographical references (p. 169-173).The development and use of a novel testbed architecture is presented. Separated spacecraft interferometers have been proposed for applications in sparse aperture radar or astronomical observations. Modeled after these systems, an integrated hardware and software interferometry testbed is developed. Utilizing acoustic sources and sensors as a simplified analog to radio or optical systems, the Acoustic Imaging Testbed's simplest function is that of a Michelson interferometer. Robot arms control the motion of microphones. Through successive measurements an acoustic image can be formed. On top of this functionality, a layered software architecture is developed. This software creates a virtual environment that mimics the command, control and communications functions appropriate to a space interferometer. Autonomous spacecraft agents interact within this environment as the logical equivalent of distributed satellites. Optimal imaging configurations are validated. A scalable approach to cluster autonomy is discussed.by John Enright.S.M

    Design and implementation of a telemetry platform for high-performance computing environments

    Get PDF
    A new generation of high-performance and distributed computing applications and services rely on adaptive and dynamic architectures and execution strategies to run efficiently, resiliently, and at scale in today’s HPC environments. These architectures require insights into their execution behaviour and the state of their execution environment at various levels of detail, in order to make context-aware decisions. HPC telemetry provides this information. It describes the continuous stream of time series and event data that is generated on HPC systems by the hardware, operating systems, services, runtime systems, and applications. Current HPC ecosystems do not provide the conceptual models, infrastructure, and interfaces to collect, store, analyse, and integrate telemetry in a structured and efficient way. Consequently, applications and services largely depend on one-off solutions and custom-built technologies to achieve these goals; introducing significant development overheads that inhibit portability and mobility. To facilitate a broader mix of applications, more efficient application development, and swift adoption of adaptive architectures in production, a comprehensive framework for telemetry management and analysis must be provided as part of future HPC ecosystem designs. This thesis provides the blueprint for such a framework: it proposes a new approach to telemetry management in HPC: the Telemetry Platform concept. Departing from the observation that telemetry data and the corresponding analysis, and integration pat- terns on modern multi-tenant HPC systems have a lot of similarities to the patterns observed in large-scale data analytics or “Big Data” platforms, the telemetry platform concept takes the data platform paradigm and architectural approach and applies them to HPC telemetry. The result is the blueprint for a system that provides services for storing, searching, analysing, and integrating telemetry data in HPC applications and other HPC system services. It allows users to create and share telemetry data-driven insights using everything from simple time-series analysis to complex statistical and machine learning models while at the same time hiding many of the inherent complexities of data management such as data transport, clean-up, storage, cataloguing, access management, and providing appropriate and scalable analytics and integration capabilities. The main contributions of this research are (1) the application of the data platform concept to HPC telemetry data management and usage; (2) a graph-based, time-variant telemetry data model that captures structures and properties of platform and applications and in which telemetry data can be organized; (3) an architecture blueprint and prototype of a concrete implementation and integration architecture of the telemetry platform; and (4) a proposal for decoupled HPC application architectures, separating telemetry data management, and feedback-control-loop logic from the core application code. First experimental results with the prototype implementation suggest that the telemetry platform paradigm can reduce overhead and redundancy in the development of telemetry-based application architectures, and lower the barrier for HPC systems research and the provisioning of new, innovative HPC system services
    corecore