282 research outputs found

    Evaluating techniques for parallelization tuning in MPI, OmpSs and MPI/OmpSs

    Get PDF
    Parallel programming is used to partition a computational problem among multiple processing units and to define how they interact (communicate and synchronize) in order to guarantee the correct result. The performance that is achieved when executing the parallel program on a parallel architecture is usually far from the optimal: computation unbalance and excessive interaction among processing units often cause lost cycles, reducing the efficiency of parallel computation. In this thesis we propose techniques oriented to better exploit parallelism in parallel applications, with emphasis in techniques that increase asynchronism. Theoretically, this type of parallelization tuning promises multiple benefits. First, it should mitigate communication and synchronization delays, thus increasing the overall performance. Furthermore, parallelization tuning should expose additional parallelism and therefore increase the scalability of execution. Finally, increased asynchronism would provide higher tolerance to slower networks and external noise. In the first part of this thesis, we study the potential for tuning MPI parallelism. More specifically, we explore automatic techniques to overlap communication and computation. We propose a speculative messaging technique that increases the overlap and requires no changes of the original MPI application. Our technique automatically identifies the application’s MPI activity and reinterprets that activity using optimally placed non-blocking MPI requests. We demonstrate that this overlapping technique increases the asynchronism of MPI messages, maximizing the overlap, and consequently leading to execution speedup and higher tolerance to bandwidth reduction. However, in the case of realistic scientific workloads, we show that the overlapping potential is significantly limited by the pattern by which each MPI process locally operates on MPI messages. In the second part of this thesis, we study the potential for tuning hybrid MPI/OmpSs parallelism. We try to gain a better understanding of the parallelism of hybrid MPI/OmpSs applications in order to evaluate how these applications would execute on future machines and to predict the execution bottlenecks that are likely to emerge. We explore how MPI/OmpSs applications could scale on the parallel machine with hundreds of cores per node. Furthermore, we investigate how this high parallelism within each node would reflect on the network constraints. We especially focus on identifying critical code sections in MPI/OmpSs. We devised a technique that quickly evaluates, for a given MPI/OmpSs application and the selected target machine, which code section should be optimized in order to gain the highest performance benefits. Also, this thesis studies techniques to quickly explore the potential OmpSs parallelism inherent in applications. We provide mechanisms to easily evaluate potential parallelism of any task decomposition. Furthermore, we describe an iterative trialand-error approach to search for a task decomposition that will expose sufficient parallelism for a given target machine. Finally, we explore potential of automating the iterative approach by capturing the programmers’ experience into an expert system that can autonomously lead the search process. Also, throughout the work on this thesis, we designed development tools that can be useful to other researchers in the field. The most advanced of these tools is Tareador – a tool to help porting MPI applications to MPI/OmpSs programming model. Tareador provides a simple interface to propose some decomposition of a code into OmpSs tasks. Tareador dynamically calculates data dependencies among the annotated tasks, and automatically estimates the potential OmpSs parallelization. Furthermore, Tareador gives additional hints on how to complete the process of porting the application to OmpSs. Tareador already proved itself useful, by being included in the academic classes on parallel programming at UPC.La programación paralela consiste en dividir un problema de computación entre múltiples unidades de procesamiento y definir como interactúan (comunicación y sincronización) para garantizar un resultado correcto. El rendimiento de un programa paralelo normalmente está muy lejos de ser óptimo: el desequilibrio de la carga computacional y la excesiva interacción entre las unidades de procesamiento a menudo causa ciclos perdidos, reduciendo la eficiencia de la computación paralela. En esta tesis proponemos técnicas orientadas a explotar mejor el paralelismo en aplicaciones paralelas, poniendo énfasis en técnicas que incrementan el asincronismo. En teoría, estas técnicas prometen múltiples beneficios. Primero, tendrían que mitigar el retraso de la comunicación y la sincronización, y por lo tanto incrementar el rendimiento global. Además, la calibración de la paralelización tendría que exponer un paralelismo adicional, incrementando la escalabilidad de la ejecución. Finalmente, un incremente en el asincronismo proveería una tolerancia mayor a redes de comunicación lentas y ruido externo. En la primera parte de la tesis, estudiamos el potencial para la calibración del paralelismo a través de MPI. En concreto, exploramos técnicas automáticas para solapar la comunicación con la computación. Proponemos una técnica de mensajería especulativa que incrementa el solapamiento y no requiere cambios en la aplicación MPI original. Nuestra técnica identifica automáticamente la actividad MPI de la aplicación y la reinterpreta usando solicitudes MPI no bloqueantes situadas óptimamente. Demostramos que esta técnica maximiza el solapamiento y, en consecuencia, acelera la ejecución y permite una mayor tolerancia a las reducciones de ancho de banda. Aún así, en el caso de cargas de trabajo científico realistas, mostramos que el potencial de solapamiento está significativamente limitado por el patrón según el cual cada proceso MPI opera localmente en el paso de mensajes. En la segunda parte de esta tesis, exploramos el potencial para calibrar el paralelismo híbrido MPI/OmpSs. Intentamos obtener una comprensión mejor del paralelismo de aplicaciones híbridas MPI/OmpSs para evaluar de qué manera se ejecutarían en futuras máquinas. Exploramos como las aplicaciones MPI/OmpSs pueden escalar en una máquina paralela con centenares de núcleos por nodo. Además, investigamos cómo este paralelismo de cada nodo se reflejaría en las restricciones de la red de comunicación. En especia, nos concentramos en identificar secciones críticas de código en MPI/OmpSs. Hemos concebido una técnica que rápidamente evalúa, para una aplicación MPI/OmpSs dada y la máquina objetivo seleccionada, qué sección de código tendría que ser optimizada para obtener la mayor ganancia de rendimiento. También estudiamos técnicas para explorar rápidamente el paralelismo potencial de OmpSs inherente en las aplicaciones. Proporcionamos mecanismos para evaluar fácilmente el paralelismo potencial de cualquier descomposición en tareas. Además, describimos una aproximación iterativa para buscar una descomposición en tareas que mostrará el suficiente paralelismo en la máquina objetivo dada. Para finalizar, exploramos el potencial para automatizar la aproximación iterativa. En el trabajo expuesto en esta tesis hemos diseñado herramientas que pueden ser útiles para otros investigadores de este campo. La más avanzada es Tareador, una herramienta para ayudar a migrar aplicaciones al modelo de programación MPI/OmpSs. Tareador proporciona una interfaz simple para proponer una descomposición del código en tareas OmpSs. Tareador también calcula dinámicamente las dependencias de datos entre las tareas anotadas, y automáticamente estima el potencial de paralelización OmpSs. Por último, Tareador da indicaciones adicionales sobre como completar el proceso de migración a OmpSs. Tareador ya se ha mostrado útil al ser incluido en las clases de programación de la UPC

    SimuBoost: Scalable Parallelization of Functional System Simulation

    Get PDF
    Für das Sammeln detaillierter Laufzeitinformationen, wie Speicherzugriffsmustern, wird in der Betriebssystem- und Sicherheitsforschung häufig auf die funktionale Systemsimulation zurückgegriffen. Der Simulator führt dabei die zu untersuchende Arbeitslast in einer virtuellen Maschine (VM) aus, indem er schrittweise Instruktionen interpretiert oder derart übersetzt, sodass diese auf dem Zustand der VM arbeiten. Dieser Prozess ermöglicht es, eine umfangreiche Instrumentierung durchzuführen und so an Informationen zum Laufzeitverhalten zu gelangen, die auf einer physischen Maschine nicht zugänglich sind. Obwohl die funktionale Systemsimulation als mächtiges Werkzeug gilt, stellt die durch die Interpretation oder Übersetzung resultierende immense Ausführungsverlangsamung eine substanzielle Einschränkung des Verfahrens dar. Im Vergleich zu einer nativen Ausführung messen wir für QEMU eine 30-fache Verlangsamung, wobei die Aufzeichnung von Speicherzugriffen diesen Faktor verdoppelt. Mit Simulatoren, die umfangreichere Instrumentierungsmöglichkeiten mitbringen als QEMU, kann die Verlangsamung um eine Größenordnung höher ausfallen. Dies macht die funktionale Simulation für lang laufende, vernetzte oder interaktive Arbeitslasten uninteressant. Darüber hinaus erzeugt die Verlangsamung ein unrealistisches Zeitverhalten, sobald Aktivitäten außerhalb der VM (z. B. Ein-/Ausgabe) involviert sind. In dieser Arbeit stellen wir SimuBoost vor, eine Methode zur drastischen Beschleunigung funktionaler Systemsimulation. SimuBoost führt die zu untersuchende Arbeitslast zunächst in einer schnellen hardwaregestützten virtuellen Maschine aus. Dies ermöglicht volle Interaktivität mit Benutzern und Netzwerkgeräten. Während der Ausführung erstellt SimuBoost periodisch Abbilder der VM (engl. Checkpoints). Diese dienen als Ausgangspunkt für eine parallele Simulation, bei der jedes Intervall unabhängig simuliert und analysiert wird. Eine heterogene deterministische Wiederholung (engl. heterogeneous deterministic Replay) garantiert, dass in dieser Phase die vorherige hardwaregestützte Ausführung jedes Intervalls exakt reproduziert wird, einschließlich Interaktionen und realistischem Zeitverhalten. Unser Prototyp ist in der Lage, die Laufzeit einer funktionalen Systemsimulation deutlich zu reduzieren. Während mit herkömmlichen Verfahren für die Simulation des Bauprozesses eines modernen Linux über 5 Stunden benötigt werden, schließt SimuBoost die Simulation in nur 15 Minuten ab. Dies sind lediglich 16% mehr Zeit, als der Bau in einer schnellen hardwaregestützten VM in Anspruch nimmt. SimuBoost ist imstande, diese Geschwindigkeit auch bei voller Instrumentierung zur Aufzeichnung von Speicherzugriffen beizubehalten. Die vorliegende Arbeit ist das erste Projekt, welches das Konzept der Partitionierung und Parallelisierung der Ausführungszeit auf die interaktive Systemvirtualisierung in einer Weise anwendet, die eine sofortige parallele funktionale Simulation gestattet. Wir ergänzen die praktische Umsetzung mit einem mathematischen Modell zur formalen Beschreibung der Beschleunigungseigenschaften. Dies erlaubt es, für ein gegebenes Szenario die voraussichtliche parallele Simulationszeit zu prognostizieren und gibt eine Orientierung zur Wahl der optimalen Intervalllänge. Im Gegensatz zu bisherigen Arbeiten legt SimuBoost einen starken Fokus auf die Skalierbarkeit über die Grenzen eines einzelnen physischen Systems hinaus. Ein zentraler Schlüssel hierzu ist der Einsatz moderner Checkpointing-Technologien. Im Rahmen dieser Arbeit präsentieren wir zwei neuartige Methoden zur effizienten und effektiven Kompression von periodischen Systemabbildern

    NASA Tech Briefs, February 2001

    Get PDF
    The topics include: 1) Application Briefs; 2) National Design Engineering Show Preview; 3) Marketing Inventions to Increase Income; 4) A Personal-Computer-Based Physiological Training System; 5) Reconfigurable Arrays of Transistors for Evolvable Hardware; 6) Active Tactile Display Device for Reading by a Blind Person; 7) Program Automates Management of IBM VM Computer Systems; 8) System for Monitoring the Environment of a Spacecraft Launch; 9) Measurement of Stresses and Strains in Muscles and Tendons; 10) Optical Measurement of Temperatures in Muscles and Tendons; 11) Small Low-Temperature Thermometer With Nanokelvin Resolution; 12) Heterodyne Interferometer With Phase-Modulated Carrier; 13) Rechargeable Batteries Based on Intercalation in Graphite; 14) Signal Processor for Doppler Measurements in Icing Research; 15) Model Optimizes Drying of Wet Sheets; 16) High-Performance POSS-Modified Polymeric Composites; 17) Model Simulates Semi-Solid Material Processing; 18) Modular Cryogenic Insulation; 19) Passive Venting for Alleviating Helicopter Tail-Boom Loads; 20) Computer Program Predicts Rocket Noise; 21) Process for Polishing Bare Aluminum to High Optical Quality; 22) External Adhesive Pressure-Wall Patch; 23) Java Implementation of Information-Sharing Protocol; 24) Electronic Bulletin Board Publishes Schedules in Real Time; 25) Apparatus Would Extract Water From the Martian Atmosphere; 26) Review of Research on Supercritical vs Subcritical Fluids; 27) Hybrid Regenerative Water-Recycling System; 28) Study of Fusion-Driven Plasma Thruster With Magnetic Nozzle; 29) Liquid/Vapor-Hydrazine Thruster Would Produce Small Impulses; and 30) Thruster Based on Sublimation of Solid Hydrazin

    Pareidolia

    Get PDF
    Pareidolia is a poetic cross-genre (creative writing) manuscript in three parts that interrogates techno-colonial-capitalism’s and the medical and wellness industrial complexes’ individualizing constructions of mental health and illness. The first part is comprised of lyric poems that conceptually foreground the mediated and/or dissociated psychology of the speaker in the digital age. The second part draws on the form of the feminist surrealist short story to defamiliarize themes such as nature, medicalization, and neoliberal routine. Finally, the third part hybridizes poetry and the essay in a case study of three monuments from 1962 that I see as co-featuring in creating our dominant societal affective atmosphere: the International Style building, Place Ville Marie, erected as an “archetype of technological know-how” in its “physical, aesthetic, and emotional presence” (Place Ville Marie: Montreal’s Shining Landmark, Vanlaethem, France et al., 9), and the concurrent publication of two psychological texts by major emotion theorists, Silvan Tomkins’s Affect Imagery Consciousness and Magda Arnold’s Story Sequence Analysis.    With references, mainly, to affect theory and various figures of the modernist canon, I plumb the material and ideological origins of our cruelly optimistic state of political and climate crisis. I examine, further, the technological prescriptions and products that have become ubiquitous. How does our dependence on such products and diagnostic and commodifying formulas of thinking shape our emotional and physical dynamics with ourselves, others, and the land? How do our products and formulas alter or distort the way we project ourselves in the past, present, and future? Pareidolia—inspired by contemporary thinkers like Lisa Robertson and Dionne Brand as much as Sianne Ngai and Anna Tsing—attends to the porous and temporally fluid relationship between the body, mind, technology, and environment

    Understanding hazard perception in filmed and simulated environments

    Get PDF
    Each year millions of people around the world are killed or injured due to being involved in collisions while driving on the roads, with young and inexperienced drivers most likely to be killed or injured. Numerous studies have found a link between the likelihood of a driver being involved in a collision and their hazard perception (HP) ability, with young and inexperienced drivers having inferior HP abilities compared to older and more experienced drivers. This thesis presents a series of studies that investigate the factors that affect HP performance as well as comparing different HP testing methods. The traditional video-based method is explored as well as new methods utilising a high-fidelity driving simulator (used as an analogue of real-world driving) in order to see if there are ways of making HP testing more representative of detecting and responding to hazards while driving on the road. This thesis also explores the use of functional near-infrared spectroscopy (fNIR) as a portable and flexible means of measuring dorsolateral prefrontal cortex (DLPFC) activity during driving. Comparisons between video-based HP testing (similar to that used in the UK HP test) and simulator-based HP testing revealed significant differences in psychophysiological and behavioural responses, indicating that video-based methods may not be representative of HP while driving on the road. fNIR was found to primarily measure task workload and to be a reliable means of recording DLPFC activity across a range of different driving tasks. The fNIR results also demonstrated that DLPFC was similar when performing a simulator-based HP test and a hybrid simulator method using simulation replays, suggesting that introducing elements of driving simulation may help bridge the gap between current video-based HP testing methods and HP while driving on the road. These findings have important implications for the theoretical and practical aspects of HP testing and the use of fNIR in driving research

    Cinemas of Endurance

    Get PDF
    Cinemas of Endurance begins by questioning the way in which critics and scholars have addressed art cinema over the last decade, specifically the films referred to as “slow cinema.” These films have garnered widespread attention since the start of the 21st century for how they deploy what many believe to be anachronistic and redundant formal techniques, often discussed in terms of nostalgia and pastiche. This dissertation argues that these films have been unfairly couched within this discourse that largely judges their validity based on their stylistic similarities to the art cinema of the 1960s and 1970s. Breaking from this direction, this project proposes we take these films and their aesthetic seriously, not for stubbornly refuting the prevailing formal trends in filmmaking, but for how it creates a critical optic that grants us a greater capacity to recognize some of the most prevailing social, political, and economic issues of the last decade. Further, by using stylistic techniques that can often register as out of place, or protracted, these films can help us to understand the way our physical, mental, and affective coordinates have shifted in this historical moment. Each chapter of this dissertation takes an exemplary film from this subset of art cinema and addresses how the aesthetic works against established modes of viewing to render visible modalities of life that often escape critical ire because they are expected. This project relies on the theories and methodologies of film and media studies, aesthetic theory, realism, materialism, accelerationism, cultural studies, continental philosophy, and political philosophy. The films and filmmakers analyzed include: The Limits of Control (2009), dir. Jim Jarmusch; Ossos (1997), In Vanda’s Room (2000), and Colossal Youth (2006), dir. Pedro Costa; Dogville (2003), dir. Lars von Trier; and, 2046 (2004), dir. Wong Kar-wai

    Witness: The Modern Writer as Witness

    Full text link
    Editor\u27s Note [Excerpt] Magic can mean many different things, especially for writers. Magic can be an illusion, a sleight of hand designed to trick onlookers into believing the impossible. Or magic can be a supernatural force in a world of harsh reality, a set of beliefs that sits just outside the realms of organized religion and advanced technology. Wizards and demons, Las Vegas entertainers and houngans --they all practice a kind of sorcery. For poets and prose writers, though, magic affords an opportunity for us to stretch the limitations of the physical world in search of new themes, settings, and characters. Magic is a door we eagerly walk through to reach new lands. We at Witness have thoroughly enjoyed the process of selecting the themed works we have collected here, mainly because the idea of enchantment is inspiring. There is the possibility of positive charms; there is a chance for dark witchery. And sometimes the spell cast by a character is nebulous, difficult to categorize. It’s arguable that we cherish these incantations the most, since they leave us in a state of wonderment bordering on disorientation. Yes, magic can also leave us bewildered and thankful for the bewilderment.https://digitalscholarship.unlv.edu/witness/1001/thumbnail.jp

    Architecting Efficient Data Centers.

    Full text link
    Data center power consumption has become a key constraint in continuing to scale Internet services. As our society’s reliance on “the Cloud” continues to grow, companies require an ever-increasing amount of computational capacity to support their customers. Massive warehouse-scale data centers have emerged, requiring 30MW or more of total power capacity. Over the lifetime of a typical high-scale data center, power-related costs make up 50% of the total cost of ownership (TCO). Furthermore, the aggregate effect of data center power consumption across the country cannot be ignored. In total, data center energy usage has reached approximately 2% of aggregate consumption in the United States and continues to grow. This thesis addresses the need to increase computational efficiency to address this grow- ing problem. It proposes a new classes of power management techniques: coordinated full-system idle low-power modes to increase the energy proportionality of modern servers. First, we introduce the PowerNap server architecture, a coordinated full-system idle low- power mode which transitions in and out of an ultra-low power nap state to save power during brief idle periods. While effective for uniprocessor systems, PowerNap relies on full-system idleness and we show that such idleness disappears as the number of cores per processor continues to increase. We expose this problem in a case study of Google Web search in which we demonstrate that coordinated full-system active power modes are necessary to reach energy proportionality and that PowerNap is ineffective because of a lack of idleness. To recover full-system idleness, we introduce DreamWeaver, architectural support for deep sleep. DreamWeaver allows a server to exchange latency for full-system idleness, allowing PowerNap-enabled servers to be effective and provides a better latency- power savings tradeoff than existing approaches. Finally, this thesis investigates workloads which achieve efficiency through methodical cluster provisioning techniques. Using the popular memcached workload, this thesis provides examples of provisioning clusters for cost-efficiency given latency, throughput, and data set size targets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91499/1/meisner_1.pd
    corecore