177 research outputs found
On the conditions for efficient interoperability with threads: An experience with PGAS languages using Cray communication domains
Today's high performance systems are typically built from shared memory nodes connected by a high speed network. That architecture, combined with the trend towards less memory per core, encourages programmers to use a mixture of message passing and multithreaded programming. Unfortunately, the advantages of using threads for in-node programming are hindered by their inability to efficiently communicate between nodes. In this work, we identify some of the performance problems that arise in such hybrid programming environments and characterize conditions needed to achieve high communication performance for multiple threads: addressability of targets, separability of communication paths, and full direct reachability to targets. Using the GASNet communication layer on the Cray XC30 as our experimental platform, we show how to satisfy these conditions. We also discuss how satisfying these conditions is influenced by the communication abstraction, implementation constraints, and the interconnect messaging capabilities. To evaluate these ideas, we compare the communication performance of a thread-based node runtime to a process-based runtime. Without our GASNet extensions, thread communication is significantly slower than processes - up to 21x slower. Once the implementation is modified to address each of our conditions, the two runtimes have comparable communication performance. This allows programmers to more easily mix models like OpenMP, CILK, or pthreads with a GASNet-based model like UPC, with the associated performance, convenience and interoperability advantages that come from using threads within a node. © 2014 ACM
DiBELLA: Distributed long read to long read alignment
We present a parallel algorithm and scalable implementation for genome analysis, specifically the problem of finding overlaps and alignments for data from "third generation" long read sequencers [29]. While long sequences of DNA offer enormous advantages for biological analysis and insight, current long read sequencing instruments have high error rates and therefore require different approaches to analysis than their short read counterparts. Our work focuses on an efficient distributed-memory parallelization of an accurate single-node algorithm for overlapping and aligning long reads. We achieve scalability of this irregular algorithm by addressing the competing issues of increasing parallelism, minimizing communication, constraining the memory footprint, and ensuring good load balance. The resulting application, diBELLA, is the first distributed memory overlapper and aligner specifically designed for long reads and parallel scalability. We describe and present analyses for high level design trade-offs and conduct an extensive empirical analysis that compares performance characteristics across state-of-the-art HPC systems as well as a commercial cloud architectures, highlighting the advantages of state-of-the-art network technologies
RDMA vs. RPC for implementing distributed data structures
Distributed data structures are key to implementing scalable applications for scientific simulations and data analysis. In this paper we look at two implementation styles for distributed data structures: remote direct memory access (RDMA) and remote procedure call (RPC). We focus on operations that require individual accesses to remote portions of a distributed data structure, e.g., accessing a hash table bucket or distributed queue, rather than global operations in which all processors collectively exchange information. We look at the trade-offs between the two styles through microbenchmarks and a performance model that approximates the cost of each. The RDMA operations have direct hardware support in the network and therefore lower latency and overhead, while the RPC operations are more expressive but higher cost and can suffer from lack of attentiveness from the remote side. We also run experiments to compare the real-world performance of RDMA- and RPC-based data structure operations with the predicted performance to evaluate the accuracy of our model, and show that while the model does not always precisely predict running time, it allows us to choose the best implementation in the examples shown. We believe this analysis will assist developers in designing data structures that will perform well on current network architectures, as well as network architects in providing better support for this class of distributed data structures
An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect
The Cray Gemini interconnect hardware provides multiple transfer mechanisms and out-of-order message delivery to improve communication throughput. In this paper we quantify the performance of one-sided and two-sided communication paradigms with respect to: 1) the optimal available hardware transfer mechanism, 2) message ordering constraints, 3) per node and per core message concurrency. In addition to using Cray native communication APIs, we use UPC and MPI micro-benchmarks to capture one- and two-sided semantics respectively. Our results indicate that relaxing the message delivery order can improve performance up to 4.6x when compared with strict ordering. When hardware allows it, high-level one-sided programming models can already take advantage of message reordering. Enforcing the ordering semantics of two-sided communication comes with a performance penalty. Furthermore, we argue that exposing out-of-order delivery at the application level is required for the next-generation programming models. Any ordering constraints in the language specifications reduce communication performance for small messages and increase the number of active cores required for peak throughput. © 2014 IEEE
HSPG-Deficient Zebrafish Uncovers Dental Aspect of Multiple Osteochondromas
Multiple Osteochondromas (MO; previously known as multiple hereditary exostosis) is an autosomal dominant genetic condition that is characterized by the formation of cartilaginous bone tumours (osteochondromas) at multiple sites in the skeleton, secondary bursa formation and impingement of nerves, tendons and vessels, bone curving, and short stature. MO is also known to be associated with arthritis, general pain, scarring and occasional malignant transformation of osteochondroma into secondary peripheral chondrosarcoma. MO patients present additional complains but the relevance of those in relation to the syndromal background needs validation. Mutations in two enzymes that are required during heparan sulphate synthesis (EXT1 or EXT2) are known to cause MO. Previously, we have used zebrafish which harbour mutations in ext2 as a model for MO and shown that ext2−/− fish have skeletal defects that resemble those seen in osteochondromas. Here we analyse dental defects present in ext2−/− fish. Histological analysis reveals that ext2−/− fish have very severe defects associated with the formation and the morphology of teeth. At 5 days post fertilization 100% of ext2−/− fish have a single tooth at the end of the 5th pharyngeal arch, whereas wild-type fish develop three teeth, located in the middle of the pharyngeal arch. ext2−/− teeth have abnormal morphology (they were shorter and thicker than in the WT) and patchy ossification at the tooth base. Deformities such as split crowns and enamel lesions were found in 20% of ext2+/− adults. The tooth morphology in ext2−/− was partially rescued by FGF8 administered locally (bead implants). Our findings from zebrafish model were validated in a dental survey that was conducted with assistance of the MHE Research Foundation. The presence of the malformed and/or displaced teeth with abnormal enamel was declared by half of the respondents indicating that MO might indeed be also associated with dental problems
Molecular pedomorphism underlies craniofacial skeletal evolution in Antarctic notothenioid fishes
Background
Pedomorphism is the retention of ancestrally juvenile traits by adults in a descendant taxon. Despite its importance for evolutionary change, there are few examples of a molecular basis for this phenomenon. Notothenioids represent one of the best described species flocks among marine fishes, but their diversity is currently threatened by the rapidly changing Antarctic climate. Notothenioid evolutionary history is characterized by parallel radiations from a benthic ancestor to pelagic predators, which was accompanied by the appearance of several pedomorphic traits, including the reduction of skeletal mineralization that resulted in increased buoyancy. Results
We compared craniofacial skeletal development in two pelagic notothenioids, Chaenocephalus aceratus and Pleuragramma antarcticum, to that in a benthic species, Notothenia coriiceps, and two outgroups, the threespine stickleback and the zebrafish. Relative to these other species, pelagic notothenioids exhibited a delay in pharyngeal bone development, which was associated with discrete heterochronic shifts in skeletal gene expression that were consistent with persistence of the chondrogenic program and a delay in the osteogenic program during larval development. Morphological analysis also revealed a bias toward the development of anterior and ventral elements of the notothenioid pharyngeal skeleton relative to dorsal and posterior elements. Conclusions
Our data support the hypothesis that early shifts in the relative timing of craniofacial skeletal gene expression may have had a significant impact on the adaptive radiation of Antarctic notothenioids into pelagic habitats
A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome
Citation: Chapman, J. A., Mascher, M., Buluç, A., Barry, K., Georganas, E., Session, A., . . . Rokhsar, D. S. (2015). A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biology, 16(1). doi:10.1186/s13059-015-0582-8Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population. © 2015 Chapman et al. licensee BioMed Central.Additional Authors: Muehlbauer, G. J.;Stein, N.;Rokhsar, D. S
Efficient and Correct Stencil Computation via Pattern Matching and Static Typing
Stencil computations, involving operations over the elements of an array, are a common programming pattern in scientific computing, games, and image processing. As a programming pattern, stencil computations are highly regular and amenable to optimisation and parallelisation. However, general-purpose languages obscure this regular pattern from the compiler, and even the programmer, preventing optimisation and obfuscating (in)correctness. This paper furthers our work on the Ypnos domain-specific language for stencil computations embedded in Haskell. Ypnos allows declarative, abstract specification of stencil computations, exposing the structure of a problem to the compiler and to the programmer via specialised syntax. In this paper we show the decidable safety guarantee that well-formed, well-typed Ypnos programs cannot index outside of array boundaries. Thus indexing in Ypnos is safe and run-time bounds checking can be eliminated. Program information is encoded as types, using the advanced type-system features of the Glasgow Haskell Compiler, with the safe-indexing invariant enforced at compile time via type checking
- …