155 research outputs found

    Extreme-scaling Applications 24/7 on JUQUEEN Blue Gene/Q

    Get PDF
    Jülich Supercomputing Centre has offered Extreme Scaling Workshops since 2009, with the latest edition in February 2015 giving seven international code teams an opportunity to (im)prove the scaling of their applications to all 458752 cores of the JUQUEEN IBM BlueGene/Q. Each of them successfully adapted their application codes and datasets to the restricted compute-node memory and exploit the massive parallelism with up to 1.8 million processes or threads. They thereby qualified to become members of the High-Q Club which now has over 24 codes demonstrating extreme scalability. Achievements in both strong and weak scaling are compared, and complemented with a review of program languages and parallelisation paradigms, exploitation of hardware threads, and file I/O requirements

    Holistic Hardware Counter Performance Analysis of Parallel Programs

    Get PDF
    The KOJAK toolkit has been augmented with refined hardware performance counter support, including more convenient measurement specification, additional metric derivations and hierarchical structuring, and an extended algebra for integrating multiple experiments. Comprehensive automated analysis of a hybrid OpenMP/MPI parallel program, the ASC Purple sPPM benchmark, is demonstrated with performance experiments on equisized POWER4-II-based IBM Regatta p690+ cluster, Opteron-based Cray XD1 cluster and UltraSPARC-IV-based Sun Fire E25000 systems. Automatically assessed communication and synchronisation performance properties, combined with a rich set of measured and derived counter metrics, provide a holistic analysis context and facilitate multi-platform comparison

    Think Eternally: Improved Algorithms for the Temp Secretary Problem and Extensions

    Get PDF
    The Temp Secretary Problem was recently introduced by [Fiat et al., ESA 2015]. It is a generalization of the Secretary Problem, in which commitments are temporary for a fixed duration. We present a simple online algorithm with improved performance guarantees for cases already considered by [Fiat et al., ESA 2015] and give competitive ratios for new generalizations of the problem. In the classical setting, where candidates have identical contract durations gamma << 1 and we are allowed to hire up to B candidates simultaneously, our algorithm is (1/2) - O(sqrt{gamma})-competitive. For large B, the bound improves to 1 - O(1/sqrt{B}) - O(sqrt{gamma}). Furthermore we generalize the problem from cardinality constraints towards general packing constraints. We achieve a competitive ratio of 1 - O(sqrt{(1+log(d) + log(B))/B}) - O(sqrt{gamma}), where d is the sparsity of the constraint matrix and B is generalized to the capacity ratio of linear constraints. Additionally we extend the problem towards arbitrary hiring durations. Our algorithmic approach is a relaxation that aggregates all temporal constraints into a non-temporal constraint. Then we apply a linear scaling algorithm that, on every arrival, computes a tentative solution on the input that is known up to this point. This tentative solution uses the non-temporal, relaxed constraints scaled down linearly by the amount of time that has already passed

    Routing Brain Traffic Through the Von Neumann Bottleneck: Parallel Sorting and Refactoring.

    Get PDF
    Generic simulation code for spiking neuronal networks spends the major part of the time in the phase where spikes have arrived at a compute node and need to be delivered to their target neurons. These spikes were emitted over the last interval between communication steps by source neurons distributed across many compute nodes and are inherently irregular and unsorted with respect to their targets. For finding those targets, the spikes need to be dispatched to a three-dimensional data structure with decisions on target thread and synapse type to be made on the way. With growing network size, a compute node receives spikes from an increasing number of different source neurons until in the limit each synapse on the compute node has a unique source. Here, we show analytically how this sparsity emerges over the practically relevant range of network sizes from a hundred thousand to a billion neurons. By profiling a production code we investigate opportunities for algorithmic changes to avoid indirections and branching. Every thread hosts an equal share of the neurons on a compute node. In the original algorithm, all threads search through all spikes to pick out the relevant ones. With increasing network size, the fraction of hits remains invariant but the absolute number of rejections grows. Our new alternative algorithm equally divides the spikes among the threads and immediately sorts them in parallel according to target thread and synapse type. After this, every thread completes delivery solely of the section of spikes for its own neurons. Independent of the number of threads, all spikes are looked at only two times. The new algorithm halves the number of instructions in spike delivery which leads to a reduction of simulation time of up to 40 %. Thus, spike delivery is a fully parallelizable process with a single synchronization point and thereby well suited for many-core systems. Our analysis indicates that further progress requires a reduction of the latency that the instructions experience in accessing memory. The study provides the foundation for the exploration of methods of latency hiding like software pipelining and software-induced prefetching

    Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers

    Full text link
    Simulation is a third pillar next to experiment and theory in the study of complex dynamic systems such as biological neural networks. Contemporary brain-scale networks correspond to directed graphs of a few million nodes, each with an in-degree and out-degree of several thousands of edges, where nodes and edges correspond to the fundamental biological units, neurons and synapses, respectively. When considering a random graph, each node's edges are distributed across thousands of parallel processes. The activity in neuronal networks is also sparse. Each neuron occasionally transmits a brief signal, called spike, via its outgoing synapses to the corresponding target neurons. This spatial and temporal sparsity represents an inherent bottleneck for simulations on conventional computers: Fundamentally irregular memory-access patterns cause poor cache utilization. Using an established neuronal network simulation code as a reference implementation, we investigate how common techniques to recover cache performance such as software-induced prefetching and software pipelining can benefit a real-world application. The algorithmic changes reduce simulation time by up to 50%. The study exemplifies that many-core systems assigned with an intrinsically parallel computational problem can overcome the von Neumann bottleneck of conventional computer architectures

    The calibration and evaluation of speed-dependent automatic zooming interfaces.

    Get PDF
    Speed-Dependent Automatic Zooming (SDAZ) is an exciting new navigation technique that couples the user's rate of motion through an information space with the zoom level. The faster a user scrolls in the document, the 'higher' they fly above the work surface. At present, there are few guidelines for the calibration of SDAZ. Previous work by Igarashi & Hinckley (2000) and Cockburn & Savage (2003) fails to give values for predefined constants governing their automatic zooming behaviour. The absence of formal guidelines means that SDAZ implementers are forced to adjust the properties of the automatic zooming by trial and error. This thesis aids calibration by identifying the low-level components of SDAZ. Base calibration settings for these components are then established using a formal evaluation recording participants' comfortable scrolling rates at different magnification levels. To ease our experiments with SDAZ calibration, we implemented a new system that provides a comprehensive graphical user interface for customising SDAZ behaviour. The system was designed to simplify future extensions---for example new components such as interaction techniques and methods to render information can easily be added with little modification to existing code. This system was used to configure three SDAZ interfaces: a text document browser, a flat map browser and a multi-scale globe browser. The three calibrated SDAZ interfaces were evaluated against three equivalent interfaces with rate-based scrolling and manual zooming. The evaluation showed that SDAZ is 10% faster for acquiring targets in a map than rate-based scrolling with manual zooming, and SDAZ is 4% faster for acquiring targets in a text document. Participants also preferred using automatic zooming over manual zooming. No difference was found for the globe browser for acquisition time or preference. However, in all interfaces participants commented that automatic zooming was less physically and mentally draining than manual zooming

    Transcriptomic and metabolomic profiling of Zymomonas mobilis during aerobic and anaerobic fermentations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Zymomonas mobilis </it>ZM4 (ZM4) produces near theoretical yields of ethanol with high specific productivity and recombinant strains are able to ferment both C-5 and C-6 sugars. <it>Z. mobilis </it>performs best under anaerobic conditions, but is an aerotolerant organism. However, the genetic and physiological basis of ZM4's response to various stresses is understood poorly.</p> <p>Results</p> <p>In this study, transcriptomic and metabolomic profiles for ZM4 aerobic and anaerobic fermentations were elucidated by microarray analysis and by high-performance liquid chromatography (HPLC), gas chromatography (GC) and gas chromatography-mass spectrometry (GC-MS) analyses. In the absence of oxygen, ZM4 consumed glucose more rapidly, had a higher growth rate, and ethanol was the major end-product. Greater amounts of other end-products such as acetate, lactate, and acetoin were detected under aerobic conditions and at 26 h there was only 1.7% of the amount of ethanol present aerobically as there was anaerobically. In the early exponential growth phase, significant differences in gene expression were not observed between aerobic and anaerobic conditions via microarray analysis. HPLC and GC analyses revealed minor differences in extracellular metabolite profiles at the corresponding early exponential phase time point.</p> <p>Differences in extracellular metabolite profiles between conditions became greater as the fermentations progressed. GC-MS analysis of stationary phase intracellular metabolites indicated that ZM4 contained lower levels of amino acids such as alanine, valine and lysine, and other metabolites like lactate, ribitol, and 4-hydroxybutanoate under anaerobic conditions relative to aerobic conditions. Stationary phase microarray analysis revealed that 166 genes were significantly differentially expressed by more than two-fold. Transcripts for Entner-Doudoroff (ED) pathway genes (<it>glk, zwf, pgl, pgk, and eno</it>) and gene <it>pdc</it>, encoding a key enzyme leading to ethanol production, were at least 30-fold more abundant under anaerobic conditions in the stationary phase based on quantitative-PCR results. We also identified differentially expressed ZM4 genes predicted by The Institute for Genomic Research (TIGR) that were not predicted in the primary annotation.</p> <p>Conclusion</p> <p>High oxygen concentrations present during <it>Z. mobilis </it>fermentations negatively influence fermentation performance. The maximum specific growth rates were not dramatically different between aerobic and anaerobic conditions, yet oxygen did affect the physiology of the cells leading to the buildup of metabolic byproducts that ultimately led to greater differences in transcriptomic profiles in stationary phase.</p

    Genome modeling system: A knowledge management platform for genomics

    Get PDF
    In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms
    corecore