67,676 research outputs found

    Assessing molecular variability in cancer genomes

    Full text link
    The dynamics of tumour evolution are not well understood. In this paper we provide a statistical framework for evaluating the molecular variation observed in different parts of a colorectal tumour. A multi-sample version of the Ewens Sampling Formula forms the basis for our modelling of the data, and we provide a simulation procedure for use in obtaining reference distributions for the statistics of interest. We also describe the large-sample asymptotics of the joint distributions of the variation observed in different parts of the tumour. While actual data should be evaluated with reference to the simulation procedure, the asymptotics serve to provide theoretical guidelines, for instance with reference to the choice of possible statistics.Comment: 22 pages, 1 figure. Chapter 4 of "Probability and Mathematical Genetics: Papers in Honour of Sir John Kingman" (Editors N.H. Bingham and C.M. Goldie), Cambridge University Press, 201

    Parallelising wavefront applications on general-purpose GPU devices

    Get PDF
    Pipelined wavefront applications form a large portion of the high performance scientific computing workloads at supercomputing centres. This paper investigates the viability of graphics processing units (GPUs) for the acceleration of these codes, using NVIDIA's Compute Unified Device Architecture (CUDA). We identify the optimisations suitable for this new architecture and quantify the characteristics of those wavefront codes that are likely to experience speedups

    On the acceleration of wavefront applications using distributed many-core architectures

    Get PDF
    In this paper we investigate the use of distributed graphics processing unit (GPU)-based architectures to accelerate pipelined wavefront applications—a ubiquitous class of parallel algorithms used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P). Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures

    An investigation of the performance portability of OpenCL

    Get PDF
    This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device types. The resulting platform-agnostic, single source application is benchmarked on a number of different architectures, and is shown to be 1.3–1.5× slower than native FORTRAN 77 or CUDA implementations on a single node and 1.3–3.1× slower on multiple nodes. We also explore the potential performance gains of OpenCL’s device fissioning capability, demonstrating up to a 3× speed-up over our original OpenCL implementation

    Experiences with porting and modelling wavefront algorithms on many-core architectures

    Get PDF
    We are currently investigating the viability of many-core architectures for the acceleration of wavefront applications and this report focuses on graphics processing units (GPUs) in particular. To this end, we have implemented NASA’s LU benchmark – a real world production-grade application – on GPUs employing NVIDIA’s Compute Unified Device Architecture (CUDA). This GPU implementation of the benchmark has been used to investigate the performance of a selection of GPUs, ranging from workstation-grade commodity GPUs to the HPC "Tesla” and "Fermi” GPUs. We have also compared the performance of the GPU solution at scale to that of traditional high perfor- mance computing (HPC) clusters based on a range of multi- core CPUs from a number of major vendors, including Intel (Nehalem), AMD (Opteron) and IBM (PowerPC). In previous work we have developed a predictive “plug-and-play” performance model of this class of application running on such clusters, in which CPUs communicate via the Message Passing Interface (MPI). By extending this model to also capture the performance behaviour of GPUs, we are able to: (1) comment on the effects that architectural changes will have on the performance of single-GPU solutions, and (2) make projections regarding the performance of multi-GPU solutions at larger scale

    WMTrace : a lightweight memory allocation tracker and analysis framework

    Get PDF
    The diverging gap between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The use of multi-core processor designs has, however, brought new problems to the design of memory architectures - increased core density without matched improvement in memory capacity is reduc- ing the available memory per parallel process. Multiple cores accessing memory simultaneously degrades performance as a result of resource con- tention for memory channels and physical DIMMs. These issues combine to ensure that memory remains an on-going challenge in the design of parallel algorithms which scale. In this paper we present WMTrace, a lightweight tool to trace and analyse memory allocation events in parallel applications. This tool is able to dynamically link to pre-existing application binaries requiring no source code modification or recompilation. A post-execution analysis stage enables in-depth analysis of traces to be performed allowing memory allocations to be analysed by time, size or function. The second half of this paper features a case study in which we apply WMTrace to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes

    Emotive computing may have a role in telecare

    Get PDF
    This brief paper sets out arguments for the introduction of new technologies into telecare and lifestyle monitoring that can detect and monitor the emotive state of patients. The significantly increased use of computers by older people will enable the elements of emotive computing to be integrated with features such as keyboards and webcams, to provide additional information on emotional state. When this is combined with other data, there will be significant opportunities for system enhancement and the identification of changes in user status, and hence of need. The ubiquity of home computing makes the keyboard a very attractive, economic and non-intrusive means of data collection and analysis

    Money and happiness : rank of income, not income, affects life satisfaction

    Get PDF
    Does money buy happiness, or does happiness come indirectly from the higher rank in society that money brings? Here we test a rank hypothesis, according to which people gain utility from the ranked position of their income within a comparison group. The rank hypothesis contrasts with traditional reference income hypotheses, which suggest utility from income depends on comparison to a social group reference norm. We find that the ranked position of an individual’s income predicts general life satisfaction, while absolute income and reference income have no effect. Furthermore, individuals weight upward comparisons more than downward comparisons. According to the rank hypothesis, income and utility are not directly linked: Increasing an individual’s income will only increase their utility if ranked position also increases and will necessarily reduce the utility of others who will lose rank

    Processing asymmetry of transitions between order and disorder in human auditory cortex

    Get PDF
    Purpose: To develop an algorithm to resolve intrinsic problems with dose calculations using pencil beams when particles involved in each beam are overreaching a lateral density interface or when they are detouring in a laterally heterogeneous medium. Method and Materials: A finding on a Gaussian distribution, such that it can be approximately decomposed into multiple narrower, shifted, and scaled ones, was applied to dynamic splitting of pencil beams implemented in a dose calculation algorithm for proton and ion beams. The method was tested in an experiment with a range-compensated carbon-ion beam. Its effectiveness and efficiency were evaluated for carbon-ion and proton beams in a heterogeneous phantom model. Results: The splitting dose calculation reproduced the detour effect observed in the experiment, which amounted to about 10% at a maximum or as large as the lateral particle-disequilibrium effect. The proton-beam dose generally showed large scattering effects including the overreach and detour effects. The overall computational times were 9 s and 45 s for non-splitting and splitting carbon-ion beams and 15 s and 66 s for non-splitting and splitting proton beams. Conclusions: The beam-splitting method was developed and verified to resolve the intrinsic size limitation of the Gaussian pencil-beam model in dose calculation algorithms. The computational speed slowed down by factor of 5, which would be tolerable for dose accuracy improvement at a maximum of 10%, in our test case.AAPM Annual Meeting 200
    corecore