20,365 research outputs found

    Thrashing

    Get PDF

    A theoretical foundation for program transformations to reduce cache thrashing due to true data sharing

    Get PDF
    AbstractCache thrashing due to true data sharing can degrade the performance of parallel programs significantly. Our previous work showed that parallel task alignment via program transformations can be quite effective for the reduction of such cache thrashing. In this paper, we present a theoretical foundation for such program transformations. Based on linear algebra and the theory of numbers, our work analyzes the data dependences among the tasks created by a fork-join parallel program and determines at compile time how these tasks should be assigned to processors in order to reduce cache thrashing due to true data sharing. Our analysis and program transformations can be easily performed by compilers for parallel computers

    Proteins Wriggle

    Get PDF
    We propose an algorithmic strategy for improving the efficiency of Monte Carlo searches for the low-energy states of proteins. Our strategy is motivated by a model of how proteins alter their shapes. In our model when proteins fold under physiological conditions, their backbone dihedral angles change synchronously in groups of four or more so as to avoid steric clashes and respect the kinematic conservation laws. They wriggle; they do not thrash. We describe a simple algorithm that can be used to incorporate wriggling in Monte Carlo simulations of protein folding. We have tested this wriggling algorithm against a code in which the dihedral angles are varied independently (thrashing). Our standard of success is the average root-mean-square distance (rmsd) between the alpha-carbons of the folding protein and those of its native structure. After 100,000 Monte Carlo sweeps, the relative decrease in the mean rmsd, as one switches from thrashing to wriggling, rises from 11% for the protein 3LZM with 164 amino acids (aa) to 40% for the protein 1A1S with 313 aa and 47% for the protein 16PK with 415 aa. These results suggest that wriggling is useful and that its utility increases with the size of the protein. One may implement wriggling on a parallel computer or a computer farm.Comment: 12 pages, 2 figures, JHEP late

    Using Negotiation to Reduce Redundant Autonomous Mobile Program Movements

    Get PDF
    Distributed load managers exhibit thrashing where tasks are repeatedly moved between locations due to incomplete global load information. This paper shows that systems of Autonomous Mobile Programs (AMPs) exhibit the same behaviour, identifying two types of redundant movement and terming them greedy effects. AMPs are unusual in that, in place of some external load management system, each AMP periodically recalculates network and program parameters and may independently move to a better execution environment. Load management emerges from the behaviour of collections of AMPs. The paper explores the extent of greedy effects by simulation, and then proposes negotiating AMPs (NAMPs) to ameliorate the problem. We present the design of AMPs with a competitive negotiation scheme (cNAMPs), and compare their performance with AMPs by simulation

    Improving the Performance and Energy Efficiency of GPGPU Computing through Adaptive Cache and Memory Management Techniques

    Get PDF
    Department of Computer Science and EngineeringAs the performance and energy efficiency requirement of GPGPUs have risen, memory management techniques of GPGPUs have improved to meet the requirements by employing hardware caches and utilizing heterogeneous memory. These techniques can improve GPGPUs by providing lower latency and higher bandwidth of the memory. However, these methods do not always guarantee improved performance and energy efficiency due to the small cache size and heterogeneity of the memory nodes. While prior works have proposed various techniques to address this issue, relatively little work has been done to investigate holistic support for memory management techniques. In this dissertation, we analyze performance pathologies and propose various techniques to improve memory management techniques. First, we investigate the effectiveness of advanced cache indexing (ACI) for high-performance and energy-efficient GPGPU computing. Specifically, we discuss the designs of various static and adaptive cache indexing schemes and present implementation for GPGPUs. We then quantify and analyze the effectiveness of the ACI schemes based on a cycle-accurate GPGPU simulator. Our quantitative evaluation shows that ACI schemes achieve significant performance and energy-efficiency gains over baseline conventional indexing scheme. We also analyze the performance sensitivity of ACI to key architectural parameters (i.e., capacity, associativity, and ICN bandwidth) and the cache indexing latency. We also demonstrate that ACI continues to achieve high performance in various settings. Second, we propose IACM, integrated adaptive cache management for high-performance and energy-efficient GPGPU computing. Based on the performance pathology analysis of GPGPUs, we integrate state-of-the-art adaptive cache management techniques (i.e., cache indexing, bypassing, and warp limiting) in a unified architectural framework to eliminate performance pathologies. Our quantitative evaluation demonstrates that IACM significantly improves the performance and energy efficiency of various GPGPU workloads over the baseline architecture (i.e., 98.1% and 61.9% on average, respectively) and achieves considerably higher performance than the state-of-the-art technique (i.e., 361.4% at maximum and 7.7% on average). Furthermore, IACM delivers significant performance and energy efficiency gains over the baseline GPGPU architecture even when enhanced with advanced architectural technologies (e.g., higher capacity, associativity). Third, we propose bandwidth- and latency-aware page placement (BLPP) for GPGPUs with heterogeneous memory. BLPP analyzes the characteristics of a application and determines the optimal page allocation ratio between the GPU and CPU memory. Based on the optimal page allocation ratio, BLPP dynamically allocate pages across the heterogeneous memory nodes. Our experimental results show that BLPP considerably outperforms the baseline and state-of-the-art technique (i.e., 13.4% and 16.7%) and performs similar to the static-best version (i.e., 1.2% difference), which requires extensive offline profiling.clos

    Redundant movements in autonomous mobility: experimental and theoretical analysis

    Get PDF
    <p>Distributed load balancers exhibit thrashing where tasks are repeatedly moved between locations due to incomplete global load information. This paper shows that systems of autonomous mobile programs (AMPs) exhibit the same behaviour, and identifies two types of redundant movement (greedy effect). AMPs are unusual in that, in place of some external load management system, each AMP periodically recalculates network and program parameters and may independently move to a better execution environment. Load management emerges from the behaviour of collections of AMPs.</p> <p>The paper explores the extent of greedy effects by simulating collections of AMPs and proposes negotiating AMPs (NAMPs) to ameliorate the problem. We present the design of AMPs with a competitive negotiation scheme (cNAMPs), and compare their performance with AMPs by simulation. We establish new properties of balanced networks of AMPs, and use these to provide a theoretical analysis of greedy effects.</p&gt

    Anticipatory adjustments to being picked up in infancy

    Get PDF
    Anticipation of the actions of others is often used as a measure of action understanding in infancy. In contrast to studies of action understanding which set infants up as observers of actions directed elsewhere, in the present study we explored anticipatory postural adjustments made by infants to one of the most common adult actions directed to them - picking them up. We observed infant behavioural changes and recorded their postural shifts on a pressure mat in three phases: (i) a prior Chat phase, (ii) from the onset of Approach of the mother's arms, and (iii) from the onset of Contact. In Study 1, eighteen 3-month-old infants showed systematic global postural changes during Approach and Contact, but not during Chat. There was an increase in specific adjustments of the arms (widening or raising) and legs (stiffening and extending or tucking up) during Approach and a decrease in thrashing/general movements during Contact. Shifts in postural stability were evident immediately after onset of Approach and more slowly after Contact, with no regular shifts during Chat. In Study 2 we followed ten infants at 2, 3 and 4 months of age. Anticipatory behavioural adjustments during Approach were present at all ages, but with greater differentiation from a prior Chat phase only at 3 and 4 months. Global postural shifts were also more phase differentiated in older infants. Moreover, there was significantly greater gaze to the mother's hands during Approach at 4 months. Early anticipatory adjustments to being picked up suggest that infants' awareness of actions directed to the self may occur earlier than of those directed elsewhere, and thus enable infants' active participation in joint actions from early in life

    Fast, automated measurement of nematode swimming (thrashing) without morphometry

    Get PDF
    Background: The "thrashing assay", in which nematodes are placed in liquid and the frequency of lateral swimming ("thrashing") movements estimated, is a well-established method for measuring motility in the genetic model organism Caenorhabditis elegans as well as in parasitic nematodes. It is used as an index of the effects of drugs, chemicals or mutations on motility and has proved useful in identifying mutants affecting behaviour. However, the method is laborious, subject to experimenter error, and therefore does not permit high-throughput applications. Existing automation methods usually involve analysis of worm shape, but this is computationally demanding and error-prone. Here we present a novel, robust and rapid method of automatically counting the thrashing frequency of worms that avoids morphometry but nonetheless gives a direct measure of thrashing frequency. Our method uses principal components analysis to remove the background, followed by computation of a covariance matrix of the remaining image frames from which the interval between statistically-similar frames is estimated. Results: We tested the performance of our covariance method in measuring thrashing rates of worms using mutations that affect motility and found that it accurately substituted for laborious, manual measurements over a wide range of thrashing rates. The algorithm used also enabled us to determine a dose-dependent inhibition of thrashing frequency by the anthelmintic drug, levamisole, illustrating the suitability of the system for assaying the effects of drugs and chemicals on motility. Furthermore, the algorithm successfully measured the actions of levamisole on a parasitic nematode, Haemonchus contortus, which undergoes complex contorted shapes whilst swimming, without alterations in the code or of any parameters, indicating that it is applicable to different nematode species, including parasitic nematodes. Our method is capable of analyzing a 30 s movie in less than 30 s and can therefore be deployed in rapid screens. Conclusion: We demonstrate that a covariance-based method yields a fast, reliable, automated measurement of C. elegans motility which can replace the far more time-consuming, manual method. The absence of a morphometry step means that the method can be applied to any nematode that swims in liquid and, together with its speed, this simplicity lends itself to deployment in large-scale chemical and genetic screens. </p
    corecore