27 research outputs found

    PARALiA: a performance aware runtime for auto-tuning linear algebra on heterogeneous systems

    Get PDF
    Dense linear algebra operations appear very frequently in high-performance computing (HPC) applications, rendering their performance crucial to achieve optimal scalability. As many modern HPC clusters contain multi-GPU nodes, BLAS operations are frequently offloaded on GPUs, necessitating the use of optimized libraries to ensure good performance. Unfortunately, multi-GPU systems are accompanied by two significant optimization challenges: data transfer bottlenecks as well as problem splitting and scheduling in multiple workers (GPUs) with distinct memories. We demonstrate that the current multi-GPU BLAS methods for tackling these challenges target very specific problem and data characteristics, resulting in serious performance degradation for any slightly deviating workload. Additionally, an even more critical decision is omitted because it cannot be addressed using current scheduler-based approaches: the determination of which devices should be used for a certain routine invocation. To address these issues we propose a model-based approach: using performance estimation to provide problem-specific autotuning during runtime. We integrate this autotuning into an end-to-end BLAS framework named PARALiA. This framework couples autotuning with an optimized task scheduler, leading to near-optimal data distribution and performance-aware resource utilization. We evaluate PARALiA in an HPC testbed with 8 NVIDIA-V100 GPUs, improving the average performance of GEMM by 1.7× and energy efficiency by 2.5× over the state-of-the-art in a large and diverse dataset and demonstrating the adaptability of our performance-aware approach to future heterogeneous systems

    Parallelization of General Linkage Analysis Problems

    Get PDF
    We describe a parallel implementation of a genetic linkage analysis program that achieves good speedups, even for analyses on a single pedigree and with a single starting recombination fraction vector. Our parallel implementation has been run on three different platforms: an Ethernet network of workstations, a higher-bandwidth. Asynchronous Transfer Mode (ATM) network of workstations, and a shared-memory multiprocessor. The same program, written in a shared memory programming style, is used on all platforms. On the workstation networks, the hardware does not provide shared memory, so the program executes on a distributed shared memory system that implements shared memory in software. These three platforms represent different points on the price/performance scale. Ethernet networks are cheap and omnipresent. ATM networks are an emerging technology that others higher bandwidth, and shared-memory multiprocessors offer the best performance because communication is implemented entirely by hardware. On 8 processors and for the longer runs, we achieve speedups between 3.5 and 5 on the Ethernet network and between 4.8 and 6 on the ATM network. On the shared-memory multiprocessor, we achieve speedups in the 5.5 to 6.5 range for all runs

    Exploiting the Hard-Working DWARF: Trojan and Exploit Techniques Without Native Executable Code

    Get PDF
    The study of vulnerabilities and exploitation is one of finding mechanisms affecting the flow of computation and of finding new means to perform unexpected computation. In this paper we show the extent to which exception handling mechanisms as implemented and used by \gcc can be used to control program execution. We show that the data structures used to store exception handling information on UNIX-like systems actually contain Turing-complete bytecode, which is executed by a virtual machine during the course of exception unwinding and handling. We discuss how a malicious attacker could gain control over these structures and how such an attacker could utilize them once control has been achieved

    Policy network assisted Monte Carlo Tree search for intelligent service function chain deployment

    Get PDF
    This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordNetwork function virtualization (NFV) simplifies the configuration and management of security services by migrating the network security functions from dedicated hardware devices to software middle-boxes that run on commodity servers. Under the paradigm of NFV, the service function chain (SFC) consisting of a series of ordered virtual network security functions is becoming a mainstream form to carry network security services. Allocating the underlying physical network resources to the demands of SFCs under given constraints over time is known as the SFC deployment problem. It is a crucial issue for infrastructure providers. However, SFC deployment is facing new challenges in trading off between pursuing the objective of high revenue-to-cost ratio and making decisions in an online manner. In this paper, we investigate the use of reinforcement learning to guide online deployment decisions for SFC requests and propose a Policy network Assisted Monte Carlo Tree search approach named PACT to address the above challenge, aiming to maximize the average revenue-to-cost ratio. PACT combines the strengths of the policy network, which evaluates the placement potential of physical servers and the Monte Carlo Tree Search, which is able to tackle problems with large state spaces. Extensive experimental results demonstrate that our PACT achieves the best performance and superior to other algorithms by up to 30% and 23.8% on average revenue-to-cost ratio and acceptance rate, respectivelyMajor Special Program for Technical Innovation & Application Development of Chongqing Science & Technology CommissionNational NSFCChongqing Research Program of Basic Research and Frontier TechnologyNatural Science Foundation of JiangsuLeading Technology of Jiangsu Basic Research PlanEuropean Union Horizon 202

    MLIP: using multiple processors to compute the posterior probability of linkage

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Localization of complex traits by genetic linkage analysis may involve exploration of a vast multidimensional parameter space. The posterior probability of linkage (PPL), a class of statistics for complex trait genetic mapping in humans, is designed to model the trait model complexity represented by the multidimensional parameter space in a mathematically rigorous fashion. However, the method requires the evaluation of integrals with no functional form, making it difficult to compute, and thus further test, develop and apply. This paper describes MLIP, a multiprocessor two-point genetic linkage analysis system that supports statistical calculations, such as the PPL, based on the full parameter space implicit in the linkage likelihood.</p> <p>Results</p> <p>The fundamental question we address here is whether the use of additional processors effectively reduces total computation time for a PPL calculation. We use a variety of data – both simulated and real – to explore the question "how close can we get?" to linear speedup. Empirical results of our study show that MLIP does significantly speed up two-point log-likelihood ratio calculations over a grid space of model parameters.</p> <p>Conclusion</p> <p>Observed performance of the program is dependent on characteristics of the data including granularity of the parameter grid space being explored and pedigree size and structure. While work continues to further optimize performance, the current version of the program can already be used to efficiently compute the PPL. Thanks to MLIP, full multidimensional genome scans are now routinely being completed at our centers with runtimes on the order of days, not months or years.</p

    Branch and Bound pattern in FastFlow

    Get PDF
    The thesis describes the development of a FastFlow skeleton implementing a Branch and Bound parallel framework that would be specialized by the user to solve all of the problems type solvable with a Branch and Bound approach. To achieve this scope the skeleton is worked out leaving to the users the task of specifying the application specific code and providing all the needed objects to implement the Branch and Bound framework

    X-linked mental retardation in S.E. Scotland

    Get PDF

    A simulation framework for traffic information dissemination in ubiquitous vehicular ad hoc networks

    Get PDF
    The ongoing efforts to apply advanced technologies to help solve transportation problems advanced the growing trend of integrating mobile wireless communications into transportation systems. In particular, vehicular ad hoc networks (VANETs) allow vehicles to constitute a decentralized traffic information system on roadways and to share their own information. This research focused on the development of an integrated transportation and communication simulation framework to build a more realistic environment with which to study VANETs, as compared to previous studies. This research implemented a VANET-based information model into an integrated transportation and communication simulation framework in which these independent simulation tools were tightly coupled and finely synchronized. A traffic information system as a VANET application was built and demonstrated based on the simulation framework developed in this research. In this system, vehicles record their own travel time data, share these data via an ad hoc network, and reroute at split sections based on stored travel time data. Disseminated speeds of traffic information via broadcast on a real roadway network were obtained. In this research, Traffic information speeds were approximately between the road speed limit in a low traffic density - in which case they were mostly delivered by vehicles traveling on the opposite directions - and half of the transmission range (250/2 meter) per second in a high traffic density, which means they were delivered by vehicles traveling in the same direction. Successful dynamic routing based on stored travel time data was demonstrated with and without an incident in this framework. At the both cases, the benefits from dynamic routing were shown even in the low market penetration. It is believed that a wide range of VANET applications can be designed and assessed using methodologies influenced by and contributed to by the simulation framework and other methods developed in this dissertation
    corecore