5 research outputs found

    Noise inspector tool

    Get PDF
    The operating system noise can interfere with normal execution programs. This behavior is becoming especially important when scaling parallel programs and amplified with global synchronizations. This work presents a tool to detect in a non-intrusive way the alien programs that share resources with current running applications in a multicore cluster.The authors acknowledge the support of the BSC (Barcelona Supercomputing Centre). We would like to thank the anonymous reviewers for their comments, which helped us to improve the manuscript. This work has been supported by the Spanish Ministry of Science and Innovation (contract TIN2015-65316), the Spanish Ministry of Economy, Industry and Competitiveness (HAR2014-57776-P) and the Generalitat de Catalunya (2014 SGR-1051).Peer ReviewedPostprint (author's final draft

    System noise, OS clock ticks, and fine-grained parallel applications

    Full text link
    As parallel jobs get bigger in size and finer in granularity, “system noise ” is increasingly becoming a problem. In fact, fine-grained jobs on clusters with thousands of SMP nodes run faster if a processor is intentionally left idle (per node), thus enabling a separation of “system noise ” from the com-putation. Paying a cost in average processing speed at a node for the sake of eliminating occasional processes delays is (unfortunately) beneficial, as such delays are enormously magnified when one late process holds up thousands of peers with which it synchronizes. We provide a probabilistic argument showing that, under certain conditions, the effect of such noise is linearly pro-portional to the size of the cluster (as is often empirically observed). We then identify a major source of noise to be indirect overhead of periodic OS clock interrupts (“ticks”), that are used by all general-purpose OSs as a means of main-taining control. This is shown for various grain sizes, plat-forms, tick frequencies, and OSs. To eliminate such noise, we suggest replacing ticks with an alternative mechanism we call “smart timers”. This turns out to also be in line with needs of desktop and mobile computing, increasing the chances of the suggested change to be accepted. 1

    Improving Packet Predictability of Scalable Network-on-Chip Designs without Priority Pre-emptive Arbitration

    Get PDF
    The quest for improving processing power and efficiency is spawning research into many-core systems with hundreds or thousands of cores. With communication being forecast as the foremost performance bottleneck, Network-on-Chips are the favoured communication infrastructure in the context mainly due to reasons like scalability and power efficiency. However, contention between non-preemptive NoC packets can result in variation in packet latencies thus potentially limiting the overall utilisation of the many-core system. Typical latency predictability enhancement techniques like Virtual Channels or Time Division Multiplexing are usually hardware expensive or non-scalable or both. This research explores the use of dynamic and scalable techniques in Network-on-Chip routers to improve packet predictability by countering Head-of-line blocking (blocked low priority packet blocking a high priority packet) and tailbacking (low priority packet utilising the link that is required by a high priority packet) of non-preemptive packets. The Priority forwarding and tunnelling technique introduced is designed to detect Head-of-line blocking situations so that its internal arbitration parameters can be altered (by forwarding packet parameters down the line) to resolve such issues. The Selective packet splitting technique presented allows resolution of tailbacking by emulating the effect of preemption of packets (by splitting packets) by using a low overhead alternative that manipulates packets. Finally, the thesis presents an architecture that allows the routers to have a notion of timeliness in data packets thus enabling packet arbitration based on application-supplied priority and timeliness thus improving the quality of service given to lower priority packets. Furthermore, the techniques presented in the thesis do not require additional hardware with the increase in size of the NoC. This enables the techniques to be scalable, as the size of the NoC or the number of packet priorities the NoC has to handle does not affect the functionality and operation of the techniques
    corecore