125,351 research outputs found

    Lattice QCD Calculations on Commodity Clusters at DESY

    Full text link
    Lattice Gauge Theory is an integral part of particle physics that requires high performance computing in the multi-Tflops regime. These requirements are motivated by the rich research program and the physics milestones to be reached by the lattice community. Over the last years the enormous gains in processor performance, memory bandwidth, and external I/O bandwidth for parallel applications have made commodity clusters exploiting PCs or workstations also suitable for large Lattice Gauge Theory applications. For more than one year two clusters have been operated at the two DESY sites in Hamburg and Zeuthen, consisting of 32 resp. 16 dual-CPU PCs, equipped with Intel Pentium 4 Xeon processors. Interconnection of the nodes is done by way of Myrinet. Linux was chosen as the operating system. In the course of the projects benchmark programs for architectural studies were developed. The performance of the Wilson-Dirac Operator (also in an even-odd preconditioned version) as the inner loop of the Lattice QCD (LQCD) algorithms plays the most important role in classifying the hardware basis to be used. Using the SIMD Streaming Extensions (SSE/SSE2) on Intel's Pentium 4 Xeon CPUs give promising results for both the single CPU and the parallel version. The parallel performance, in addition to the CPU power and the memory throughput, is nevertheless strongly influenced by the behavior of hardware components like the PC chip-set and the communication interfaces. The paper covers the physics motivation for using PC clusters as well as a system description, operating experiences, and benchmark results for various hardware.Comment: Talks from Computing in High Energy and Nuclear Physics (CHEP03), PSN TUIT001-003, 13 pages, 10 figures, gzipped tar fil

    Noise-based deterministic logic and computing: a brief survey

    Full text link
    A short survey is provided about our recent explorations of the young topic of noise-based logic. After outlining the motivation behind noise-based computation schemes, we present a short summary of our ongoing efforts in the introduction, development and design of several noise-based deterministic multivalued logic schemes and elements. In particular, we describe classical, instantaneous, continuum, spike and random-telegraph-signal based schemes with applications such as circuits that emulate the brain's functioning and string verification via a slow communication channel.Comment: Invited pape

    Information Processing Capability of Soft Continuum Arms

    Full text link
    Soft Continuum arms, such as trunk and tentacle robots, can be considered as the "dual" of traditional rigid-bodied robots in terms of manipulability, degrees of freedom, and compliance. Introduced two decades ago, continuum arms have not yet realized their full potential, and largely remain as laboratory curiosities. The reasons for this lag rest upon their inherent physical features such as high compliance which contribute to their complex control problems that no research has yet managed to surmount. Recently, reservoir computing has been suggested as a way to employ the body dynamics as a computational resource toward implementing compliant body control. In this paper, as a first step, we investigate the information processing capability of soft continuum arms. We apply input signals of varying amplitude and bandwidth to a soft continuum arm and generate the dynamic response for a large number of trials. These data is aggregated and used to train the readout weights to implement a reservoir computing scheme. Results demonstrate that the information processing capability varies across input signal bandwidth and amplitude. These preliminary results demonstrate that soft continuum arms have optimal bandwidth and amplitude where one can implement reservoir computing.Comment: Submitted to 2019 IEEE International Conference on Soft Robotics (RoboSoft 2019

    Performance Comparison on Parallel CPU and GPU Algorithms for Unified Gas-Kinetic Scheme

    Full text link
    Parallel algorithms on CPU and GPU are implemented for the Unified Gas-Kinetic Scheme and their performances are investigated and compared by a two dimensional channel flow case. The parallel CPU algorithm has a one dimensional block partition that parallelizes only the spatial space. Due to the intrinsic feature of the UGKS, a compromised two-level parallelization is adopted for GPU algorithm. A series of meshes with different sizes are tested to reveal the performance evolution of the algorithms with respect to problem size. Then special attentions are paid to UGKS applications where the molecular velocity space range is large. The comparison confirms that GPU has relative elevated accelerations with the latest device having a speedup of 118.38x. Parallel CPU algorithm, on the contrary, might provide better performances when the grid point number in velocity space is large.Comment: UGKS, GPU acceleration, parallel algorithm, performance compariso

    FogStore: Toward a Distributed Data Store for Fog Computing

    Full text link
    Stateful applications and virtualized network functions (VNFs) can benefit from state externalization to increase their reliability, scalability, and inter-operability. To keep and share the externalized state, distributed data stores (DDSs) are a powerful tool allowing for the management of classical trade-offs in consistency, availability and partitioning tolerance. With the advent of Fog and Edge Computing, stateful applications and VNFs are pushed from the data centers toward the network edge. This poses new challenges on DDSs that are tailored to a deployment in Cloud data centers. In this paper, we propose two novel design goals for DDSs that are tailored to Fog Computing: (1) Fog-aware replica placement, and (2) context-sensitive differential consistency. To realize those design goals on top of existing DDSs, we propose the FogStore system. FogStore manages the needed adaptations in replica placement and consistency management transparently, so that existing DDSs can be plugged into the system. To show the benefits of FogStore, we perform a set of evaluations using the Yahoo Cloud Serving Benchmark.Comment: To appear in Proceedings of 2017 IEEE Fog World Congress (FWC '17

    A 3D radiative transfer framework: XIII. OpenCL implementation

    Full text link
    We discuss an implementation of our 3D radiative transfer (3DRT) framework with the OpenCL paradigm for general GPU computing. We implement the kernel for solving the 3DRT problem in Cartesian coordinates with periodic boundary conditions in the horizontal (x,y)(x,y) plane, including the construction of the nearest neighbor \Lstar and the operator splitting step. We present the results of a small and a large test case and compare the timing of the 3DRT calculations for serial CPUs and various GPUs. The latest available GPUs can lead to significant speedups for both small and large grids compared to serial (single core) computations.Comment: A&A, in pres

    A Probabilistic Design Method for Fatigue Life of Metallic Component

    Full text link
    In the present study, a general probabilistic design framework is developed for cyclic fatigue life prediction of metallic hardware using methods that address uncertainty in experimental data and computational model. The methodology involves (i) fatigue test data conducted on coupons of Ti6Al4V material (ii) continuum damage mechanics based material constitutive models to simulate cyclic fatigue behavior of material (iii) variance-based global sensitivity analysis (iv) Bayesian framework for model calibration and uncertainty quantification and (v) computational life prediction and probabilistic design decision making under uncertainty. The outcomes of computational analyses using the experimental data prove the feasibility of the probabilistic design methods for model calibration in presence of incomplete and noisy data. Moreover, using probabilistic design methods result in assessment of reliability of fatigue life predicted by computational models

    Digital Shearlet Transform

    Full text link
    Over the past years, various representation systems which sparsely approximate functions governed by anisotropic features such as edges in images have been proposed. We exemplarily mention the systems of contourlets, curvelets, and shearlets. Alongside the theoretical development of these systems, algorithmic realizations of the associated transforms were provided. However, one of the most common shortcomings of these frameworks is the lack of providing a unified treatment of the continuum and digital world, i.e., allowing a digital theory to be a natural digitization of the continuum theory. In fact, shearlet systems are the only systems so far which satisfy this property, yet still deliver optimally sparse approximations of cartoon-like images. In this chapter, we provide an introduction to digital shearlet theory with a particular focus on a unified treatment of the continuum and digital realm. In our survey we will present the implementations of two shearlet transforms, one based on band-limited shearlets and the other based on compactly supported shearlets. We will moreover discuss various quantitative measures, which allow an objective comparison with other directional transforms and an objective tuning of parameters. The codes for both presented transforms as well as the framework for quantifying performance are provided in the Matlab toolbox ShearLab.Comment: arXiv admin note: substantial text overlap with arXiv:1106.205

    Accelerating High-Strain Continuum-Scale Brittle Fracture Simulations with Machine Learning

    Full text link
    Failure in brittle materials under dynamic loading conditions is a result of the propagation and coalescence of microcracks. Simulating this mechanism at the continuum level is computationally expensive or, in some cases, intractable. The computational cost is due to the need for highly resolved computational meshes required to capture complex crack growth behavior, such as branching, turning, etc. Typically, continuum-scale models that account for brittle damage evolution homogenize the crack network in some way, which reduces the overall computational cost, but can also neglect key physics of the subgrid crack growth behavior, sacrificing accuracy for efficiency. We have developed an approach using machine learning that overcomes the current inability to represent micro-scale physics at the macro-scale. Our approach leverages damage and stress data from a high-fidelity model that explicitly resolves microcrack behavior to build an inexpensive machine learning emulator, which runs in seconds as opposed to the high-fidelity model, which takes hours. Once trained, the machine learning emulator is used to predict the evolution of crack length statistics. A continuum-scale constitutive model is then informed with these crack statistics, speeding up the workflow by four orders of magnitude. Both the machine learning model and the continuum-scale model are validated against a high-fidelity model and experimental data, respectively, showing excellent agreement. There are two key findings. The first is that we can reduce the dimensionality of the problem, establishing that the machine learning emulator only needs the length of the longest crack and one of the maximum stress components to capture the necessary physics. Another compelling finding is that the emulator can be trained in one experimental setting and transferred successfully to predict behavior in a different setting.Comment: Keywords: Computational Material Science, Machine Learning. 27 pages,13 figures, in review at COMMAT Elsevier journa

    Near-optimal Smooth Path Planning for Multisection Continuum Arms

    Full text link
    We study the path planning problem for continuum-arm robots, in which we are given a starting and an end point, and we need to compute a path for the tip of the continuum arm between the two points. We consider both cases where obstacles are present and where they are not. We demonstrate how to leverage the continuum arm features to introduce a new model that enables a path planning approach based on the configurations graph, for a continuum arm consisting of three sections, each consisting of three muscle actuators. The algorithm we apply to the configurations graph allows us to exploit parallelism in the computation to obtain efficient implementation. We conducted extensive tests, and the obtained results show the completeness of the proposed algorithm under the considered discretizations, in both cases where obstacles are present and where they are not. We compared our approach to the standard inverse kinematics approach. While the inverse kinematics approach is much faster when successful, our algorithm always succeeds in finding a path or reporting that no path exists, compared to a roughly 70% success rate of the inverse kinematics approach (when a path exists).Comment: Submitted to 2019 IEEE International Conference on Soft Robotics (RoboSoft 2019
    corecore