113 research outputs found

    Design and evaluation of a Thread-Level Speculation runtime library

    Get PDF
    En los próximos años es más que probable que máquinas con cientos o incluso miles de procesadores sean algo habitual. Para aprovechar estas máquinas, y debido a la dificultad de programar de forma paralela, sería deseable disponer de sistemas de compilación o ejecución que extraigan todo el paralelismo posible de las aplicaciones existentes. Así en los últimos tiempos se han propuesto multitud de técnicas paralelas. Sin embargo, la mayoría de ellas se centran en códigos simples, es decir, sin dependencias entre sus instrucciones. La paralelización especulativa surge como una solución para estos códigos complejos, posibilitando la ejecución de cualquier tipo de códigos, con o sin dependencias. Esta técnica asume de forma optimista que la ejecución paralela de cualquier tipo de código no de lugar a errores y, por lo tanto, necesitan de un mecanismo que detecte cualquier tipo de colisión. Para ello, constan de un monitor responsable que comprueba constantemente que la ejecución no sea errónea, asegurando que los resultados obtenidos de forma paralela sean similares a los de cualquier ejecución secuencial. En caso de que la ejecución fuese errónea los threads se detendrían y reiniciarían su ejecución para asegurar que la ejecución sigue la semántica secuencial. Nuestra contribución en este campo incluye (1) una nueva librería de ejecución especulativa fácil de utilizar; (2) nuevas propuestas que permiten reducir de forma significativa el número de accesos requeridos en las peraciones especulativas, así como consejos para reducir la memoria a utilizar; (3) propuestas para mejorar los métodos de scheduling centradas en la gestión dinámica de los bloques de iteraciones utilizados en las ejecuciones especulativas; (4) una solución híbrida que utiliza memoria transaccional para implementar las secciones críticas de una librería de paralelización especulativa; y (5) un análisis de las técnicas especulativas en uno de los dispositivos más vanguardistas del momento, los coprocesadores Intel Xeon Phi. Como hemos podido comprobar, la paralelización especulativa es un campo de investigación activo. Nuestros resultados demuestran que esta técnica permite obtener mejoras de rendimiento en un gran número de aplicaciones. Así, esperamos que este trabajo contribuya a facilitar el uso de soluciones especulativas en compiladores comerciales y/o modelos de programación paralela de memoria compartida.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos

    Energy-Efficient Acceleration of Asynchronous Programs.

    Full text link
    Asynchronous or event-driven programming has become the dominant programming model in the last few years. In this model, computations are posted as events to an event queue from where they get processed asynchronously by the application. A huge fraction of computing systems built today use asynchronous programming. All the Web 2.0 JavaScript applications (e.g., Gmail, Facebook) use asynchronous programming. There are now more than two million mobile applications available between the Apple App Store and Google Play, which are all written using asynchronous programming. Distributed servers (e.g., Twitter, LinkedIn, PayPal) built using actor-based languages (e.g., Scala) and platforms such as node.js rely on asynchronous events for scalable communication. Internet-of-Things (IoT), embedded systems, sensor networks, desktop GUI applications, etc., all rely on the asynchronous programming model. Despite the ubiquity of asynchronous programs, their unique execution characteristics have been largely ignored by conventional processor architectures, which have remained heavily optimized for synchronous programs. Asynchronous programs are characterized by short events executing varied tasks. This results in a large instruction footprint with little cache locality, severely degrading cache performance. Also, event execution has few repeatable patterns causing poor branch prediction. This thesis proposes novel processor optimizations exploiting the unique execution characteristics of asynchronous programs for performance optimization and energy-efficiency. These optimizations are designed to make the underlying hardware aware of discrete events and thereafter, exploit the latent Event-Level Parallelism present in these applications. Through speculative pre-execution of future events, cache addresses and branch outcomes are recorded and later used for improving cache and branch predictor performance. A hardware instruction prefetcher specialized for asynchronous programs is also proposed as a comparative design direction.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120780/1/gauravc_1.pd

    Multiprocessor Image-Based Control: Model-Driven Optimisation

    Get PDF
    Over the last years, cameras have become an integral component of modern cyber-physical systems due to their versatility, relatively low cost and multi-functionality. Camera sensors form the backbone of modern applications like advanced driver assistance systems (ADASs), visual servoing, telerobotics, autonomous systems, electron microscopes, surveillance and augmented reality. Image-based control (IBC) systems refer to a class of data-intensive feedback control systems whose feedback is provided by the camera sensor(s). IBC systems have become popular with the advent of efficient image-processing algorithms, low-cost complementary metal–oxide semiconductor (CMOS) cameras with high resolution and embedded multiprocessor computing platforms with high performance. The combination of the camera sensor(s) and image-processing algorithms can detect a rich set of features in an image. These features help to compute the states of the IBC system, such as relative position, distance, or depth, and support tracking of the object-of-interest. Modern industrial compute platforms offer high performance by allowing parallel and pipelined execution of tasks on their multiprocessors.The challenge, however, is that the image-processing algorithms are compute-intensive and result in an inherent relatively long sensing delay. State-of-the-art design methods do not fully exploit the IBC system characteristics and advantages of the multiprocessor platforms for optimising the sensing delay. The sensing delay of an IBC system is moreover variable with a significant degree of variation between the best-case and worst-case delay due to application-specific image-processing workload variations and the impact of platform resources. A long variable sensing delay degrades system performance and stability. A tight predictable sensing delay is required to optimise the IBC system performance and to guarantee the stability of the IBC system. Analytical computation of sensing delay is often pessimistic due to image-dependent workload variations or challenging platform timing analysis. Therefore, this thesis explores techniques to cope with the long variable sensing delay by considering application-specific IBC system characteristics and exploiting the benefits of the multiprocessor platforms. Effectively handling the long variable sensing delay helps to optimise IBC system performance while guaranteeing IBC system stability

    Application of Remote Sensing to the Chesapeake Bay Region. Volume 2: Proceedings

    Get PDF
    A conference was held on the application of remote sensing to the Chesapeake Bay region. Copies of the papers, resource contributions, panel discussions, and reports of the working groups are presented

    High-performance cluster computing, algorithms, implementations and performance evaluation for computation-intensive applications to promote complex scientific research on turbulent flows

    Get PDF
    Large-scale high-performance computing is a very rapidly growing field of research that plays a vital role in the advance of science, engineering, and modern industrial technology. Increasing sophistication in research has led to a need for bigger and faster computers or computer clusters, and high-performance computer systems are themselves stimulating the redevelopment of the methods of computation. Computing is fast becoming the most frequently used technique to explore new questions. We have developed high-performance computer simulation modeling software system on turbulent flows. Five papers are selected to present here from dozens of papers published in our efforts on complex software system development and knowledge discovery through computer simulations. The first paper describes the end-to-end computer simulation system development and simulation results that help understand the nature of complex shelterbelt turbulent flows. The second paper deals specifically with high-performance algorithm design and implementation in a cluster of computers. The third paper discusses the twelve design processes of parallel algorithms and software system as well as theoretical performance modeling and characterization of cluster computing. The fourth paper is about the computing framework of drag and pressure coefficients. The fifth paper is about simulated evapotranspiration and energy partition of inhomogeneous ecosystems. We discuss the end-to-end computer simulation system software development, distributed parallel computing performance modeling and system performance characterization. We design and compare several parallel implementations of our computer simulation system and show that the performance depends on algorithm design, communication channel pattern, and coding strategies that significantly impact load balancing, speedup, and computing efficiency. For a given cluster communication characteristics and a given problem complexity, there exists an optimal number of nodes. With this computer simulation system, we resolved many historically controversial issues and a lot of important problems

    Large-scale variational inference for Bayesian joint regression modelling of high-dimensional genetic data

    Get PDF
    Genetic association studies have become increasingly important in understanding the molecular bases of complex human traits. The specific analysis of intermediate molecular traits, via quantitative trait locus (QTL) studies, has recently received much attention, prompted by the advance of high-throughput technologies for quantifying gene, protein and metabolite levels. Of great interest is the detection of weak trans-regulatory effects between a genetic variant and a distal gene product. In particular, hotspot genetic variants, which remotely control the levels of many molecular outcomes, may initiate decisive functional mechanisms underlying disease endpoints. This thesis proposes a Bayesian hierarchical approach for joint analysis of QTL data on a genome-wide scale. We consider a series of parallel sparse regressions combined in a hierarchical manner to flexibly accommodate high-dimensional responses (molecular levels) and predictors (genetic variants), and we present new methods for large-scale inference. Existing approaches have limitations. Conventional marginal screening does not account for local dependencies and association patterns common to multiple outcomes and genetic variants, whereas joint modelling approaches are restricted to relatively small datasets by computational constraints. Our novel framework allows information-sharing across outcomes and variants, thereby enhancing the detection of weak trans and hotspot effects, and implements tailored variational inference procedures that allow simultaneous analysis of data for an entire QTL study, comprising hundreds of thousands of predictors, and thousands of responses and samples. The present work also describes extensions to leverage spatial and functional information on the genetic variants, for example, using predictor-level covariates such as epigenomic marks. Moreover, we augment variational inference with simulated annealing and parallel expectation-maximisation schemes in order to enhance exploration of highly multimodal spaces and allow efficient empirical Bayes estimation. Our methods, publicly available as packages implemented in R and C++, are extensively assessed in realistic simulations. Their advantages are illustrated in several QTL applications, including a large-scale proteomic QTL study on two clinical cohorts that highlights novel candidate biomarkers for metabolic disorders

    Final Technical Report: Integrated Restoration Strategies Towards Weed Control on Western Rangelands

    Get PDF
    Invasive species are having severe ecological (Mack et al. 2000) and economic (Pimentel et al. 2005) impacts on ecosystems around the world. Invasive species can alter many ecosystem processes (Crooks 2002, Walker & Smith 1997) including: water and nutrient availability, such as form and amount of N if the soil (Evans et al. 2001, Sperry et al. 2006); primary productivity, through shifts in growth rates or efficiency of resource use; disturbance regimes, including the type, frequency, and severity of disturbances such as fire (D’Antonio 2002); and community dynamics, such as species replacements (Alvarez & Cushman 2002). The economic losses and damages by invasive plants are estimated to be ~34billionintheUSand 34 billion in the US and ~95 billion worldwide (Pimental et al. 2005). Although trade and human migrations are among the most important vectors for introducing invasive plants (Mack et al. 2000), similar consensus on the causal mechanism for invasiveness is lacking (Dietz & Edwards 2006). Many different hypotheses have been proposed to explain why species are invasive. Some hypotheses, such as the vacant niche hypothesis, are conceptually appealing but lack concrete evidence to support them (Mack et al. 2000). Others, such as the allelopathy hypothesis (Callaway & Aschehoug 2000, Bais et al. 2003), have strong evidence to support them for some specific cases, but are unlikely to be important for most plants. Understanding why a species is invasive is important because it provides insight into how to control the invasion. Because a causal mechanism that is universally applicable to all plants has not been identified to date, careful attention must be made to biological and ecological characteristics of the plants and communities of interest if control strategies are to be implemented

    Computer Aided Verification

    Get PDF
    This open access two-volume set LNCS 13371 and 13372 constitutes the refereed proceedings of the 34rd International Conference on Computer Aided Verification, CAV 2022, which was held in Haifa, Israel, in August 2022. The 40 full papers presented together with 9 tool papers and 2 case studies were carefully reviewed and selected from 209 submissions. The papers were organized in the following topical sections: Part I: Invited papers; formal methods for probabilistic programs; formal methods for neural networks; software Verification and model checking; hyperproperties and security; formal methods for hardware, cyber-physical, and hybrid systems. Part II: Probabilistic techniques; automata and logic; deductive verification and decision procedures; machine learning; synthesis and concurrency. This is an open access book

    Nuclear dependence of proton-induced Drell-Yan dimuon production at 120 GeV at SeaQuest

    Get PDF
    A measurement of the atomic mass (A) dependence of Drell-Yan (p + A → μ+μ− + X) dimuons produced by 120 GeV protons is presented here. The data was taken by the SeaQuest experiment at Fermilab using a proton beam extracted from its Main Injector. Over 61,000 dimuon pairs were recorded with invariant mass between 4.2 and 10GeV and target parton momentum fraction x2 between 0.1 and 0.5 for nuclear targets H, d (deuterium), C, Fe, and W. The ratio of Drell-Yan dimuon yields per nucleon (Y) for heavy nuclei over deuterium, R = Y(A)/Y(d), is approximately equal to the ratio of the distributions of anti-up quarks in the two nuclei. This measurement is therefore sensitive to modifications in the anti-quark sea distributions in nuclei for the case of proton-induced Drell-Yan. The data analyzed here and in the future of SeaQuest will provide tighter constraints on various models that attempt to define the anomalous behavior of nuclear modification as seen in deep inelastic lepton scattering, a phenomenon generally known as the EMC effect
    corecore