266 research outputs found

    Low-latency message passing over gigabit ethernet clusters

    Get PDF
    As Ethernet hardware bandwidth increased to Gigabit speeds it became evident that it was difficult for conventional messaging protocols to deliver this performance to the application layer. Kernel based protocols such as TCP/IP impose a significant load on the host processor in order to service incoming packets and pass them to the application layer. Under heavy loads this problem can also lead to the host processor being completely used up for processing incoming messages, thus starving host applications of CPU resources. Another problem suffered by inter-process communication using small messages is the latency imposed by memory-to-memory copying in layered protocols as well as the slow context switching times in kernel-level schedulers required for servicing incoming interrupts. All this has put pressure on messaging software which led to the development of several lower latency userlevel protocols specifically adapted to high-performance networks (see U-Net[18], EMP[16], VIA[3], QsNET[15], Active Messages[19], GM[13], FM[14]). The aim of this paper is to investigate the issues involved in building high performance cluster messaging systems. We will also review some of the more prominent work in the area as well as propose a low-overhead low-latency messaging system to be used by a cluster of commodity platforms running over Gigabit Ethernet. We propose to use the programmable Netgear GA620-T NICs and modify their firmware to design a lightweight reliable OS-bypass protocol for message passing. We propose the use of zero-copy and polling techniques in order to keep host CPU utilization to a minimum whilst obtaining the maximum bandwidth possible.peer-reviewe

    Design and development of deadline based scheduling mechanisms for multiprocessor systems

    Get PDF
    Multiprocessor systems are nowadays de facto standard for both personal computers and server workstations. Benefits of multicore technology will be used in the next few years for embedded devices and cellular phones as well. Linux, as a General Purpose Operating System (GPOS), must support many different hardware platform, from workstations to mobile devices. Unfortu- nately, Linux has not been designed to be a Real-Time Operating System (RTOS). As a consequence, time-sensitive (e.g. audio/video players) or sim- ply real-time interactive applications, may suffer degradations in their QoS. In this thesis we extend the implementation of the “Earliest Deadline First” algorithm in the Linux kernel from single processor to multicore systems, allowing processes migration among the CPUs. We also discuss the design choices and present the experimental results that show the potential of our work

    Dungeons and Data: A Large-Scale NetHack Dataset

    Get PDF
    Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go [50], StarCraft [58], or DOTA [3], have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dataset (NLD), a large and highly-scalable dataset of trajectories from the popular game of NetHack, which is both extremely challenging for current methods and very fast to run [23]. NLD consists of three parts: 10 billion state transitions from 1.5 million human trajectories collected on the NAO public NetHack server from 2009 to 2020; 3 billion state-action-score transitions from 100,000 trajectories collected from the symbolic bot winner of the NetHack Challenge 2021; and, accompanying code for users to record, load and stream any collection of such trajectories in a highly compressed form. We evaluate a wide range of existing algorithms including online and offline RL, as well as learning from demonstrations, showing that significant research advances are needed to fully leverage large-scale datasets for challenging sequential decision making tasks

    GeantV: Results from the prototype of concurrent vector particle transport simulation in HEP

    Full text link
    Full detector simulation was among the largest CPU consumer in all CERN experiment software stacks for the first two runs of the Large Hadron Collider (LHC). In the early 2010's, the projections were that simulation demands would scale linearly with luminosity increase, compensated only partially by an increase of computing resources. The extension of fast simulation approaches to more use cases, covering a larger fraction of the simulation budget, is only part of the solution due to intrinsic precision limitations. The remainder corresponds to speeding-up the simulation software by several factors, which is out of reach using simple optimizations on the current code base. In this context, the GeantV R&D project was launched, aiming to redesign the legacy particle transport codes in order to make them benefit from fine-grained parallelism features such as vectorization, but also from increased code and data locality. This paper presents extensively the results and achievements of this R&D, as well as the conclusions and lessons learnt from the beta prototype.Comment: 34 pages, 26 figures, 24 table

    Go Left, Young Folk : Meridel Le Sueur’s Radical Children’s Stories Invoke the Spirit of the Red, White, and True

    Get PDF
    It is no secret to scholars of American literary Communism that left-wing authors blacklisted by adult and textbook publishers that caved in to government pressure during the Communist witch- hunts of the McCarthy era, often survived by writing children’s books. However, by accepting this overly simplified explanation, we risk ignoring a vital genre in recovering a link in American literary and cultural history that a right-of-center government attempted to erase. In my thesis I will explore how left-wing writer Meridel Le Sueur, in her children’s books, Little Brother of the Wilderness: The Story of Johnny Appleseed, Nancy Hanks of Wilderness Road: A Story of Abraham Lincoln’s Mother, Sparrow Hawk, Chanticleer of Wilderness Road: A Story of Davy Crockett, and The River Road: A Story of Abraham Lincoln, countered government-induced hysteria that domestic Communism was a threat to American society by reclaiming American history for the left, placing it at the heart of American traditions and myths. I will identify in Le Sueur’s wilderness book series how she paid homage to the defunct Popular Front’s attempt to reclaim bourgeois institutions and traditions through American folklore, and how she called on the nation’s own folk heroes to validate its own revolutionary roots. Beyond that, I will demonstrate how Le Sueur looked even further back to the precursor of American folklore to recover the socialist nature of Native Americans through their egalitarian, genderless society, grounded in a fusion of democratic and communal spirit. Criticizing even her own beloved American Communist Party, Le Sueur nearly got herself blacklisted from American Communist publishers as well, revealing that Le Sueur’s commitment to the working class overrode her commitment to the party line. I also will explore how Le Sueur’s powerful female protagonists reflect the Popular Front’s move to recruit mothers to influence the next generation. Finally, I will examine how her interpretation of American folklore teaches children—adolescent boys in particular—that a revolution forged in imagination, diversity, cooperation, and love offers a lasting alternative to violence in creating a new egalitarian society. For Le Sueur, children were at the center of the Communist writer’s hallmark message of hope. I will argue that her stories were in fact a call to action by challenging children to use their words as weapons in her peaceful revolution to end oppression

    Doctor of Philosophy

    Get PDF
    dissertationWith the explosion of chip transistor counts, the semiconductor industry has struggled with ways to continue scaling computing performance in line with historical trends. In recent years, the de facto solution to utilize excess transistors has been to increase the size of the on-chip data cache, allowing fast access to an increased portion of main memory. These large caches allowed the continued scaling of single thread performance, which had not yet reached the limit of instruction level parallelism (ILP). As we approach the potential limits of parallelism within a single threaded application, new approaches such as chip multiprocessors (CMP) have become popular for scaling performance utilizing thread level parallelism (TLP). This dissertation identifies the operating system as a ubiquitous area where single threaded performance and multithreaded performance have often been ignored by computer architects. We propose that novel hardware and OS co-design has the potential to significantly improve current chip multiprocessor designs, enabling increased performance and improved power efficiency. We show that the operating system contributes a nontrivial overhead to even the most computationally intense workloads and that this OS contribution grows to a significant fraction of total instructions when executing several common applications found in the datacenter. We demonstrate that architectural improvements have had little to no effect on the performance of the OS over the last 15 years, leaving ample room for improvements. We specifically consider three potential solutions to improve OS execution on modern processors. First, we consider the potential of a separate operating system processor (OSP) operating concurrently with general purpose processors (GPP) in a chip multiprocessor organization, with several specialized structures acting as efficient conduits between these processors. Second, we consider the potential of segregating existing caching structures to decrease cache interference between the OS and application. Third, we propose that there are components within the OS itself that should be refactored to be both multithreaded and cache topology aware, which in turn, improves the performance and scalability of many-threaded applications
    • …
    corecore