125,351 research outputs found
Lattice QCD Calculations on Commodity Clusters at DESY
Lattice Gauge Theory is an integral part of particle physics that requires
high performance computing in the multi-Tflops regime. These requirements are
motivated by the rich research program and the physics milestones to be reached
by the lattice community. Over the last years the enormous gains in processor
performance, memory bandwidth, and external I/O bandwidth for parallel
applications have made commodity clusters exploiting PCs or workstations also
suitable for large Lattice Gauge Theory applications. For more than one year
two clusters have been operated at the two DESY sites in Hamburg and Zeuthen,
consisting of 32 resp. 16 dual-CPU PCs, equipped with Intel Pentium 4 Xeon
processors. Interconnection of the nodes is done by way of Myrinet. Linux was
chosen as the operating system. In the course of the projects benchmark
programs for architectural studies were developed. The performance of the
Wilson-Dirac Operator (also in an even-odd preconditioned version) as the inner
loop of the Lattice QCD (LQCD) algorithms plays the most important role in
classifying the hardware basis to be used. Using the SIMD Streaming Extensions
(SSE/SSE2) on Intel's Pentium 4 Xeon CPUs give promising results for both the
single CPU and the parallel version. The parallel performance, in addition to
the CPU power and the memory throughput, is nevertheless strongly influenced by
the behavior of hardware components like the PC chip-set and the communication
interfaces. The paper covers the physics motivation for using PC clusters as
well as a system description, operating experiences, and benchmark results for
various hardware.Comment: Talks from Computing in High Energy and Nuclear Physics (CHEP03), PSN
TUIT001-003, 13 pages, 10 figures, gzipped tar fil
Noise-based deterministic logic and computing: a brief survey
A short survey is provided about our recent explorations of the young topic
of noise-based logic. After outlining the motivation behind noise-based
computation schemes, we present a short summary of our ongoing efforts in the
introduction, development and design of several noise-based deterministic
multivalued logic schemes and elements. In particular, we describe classical,
instantaneous, continuum, spike and random-telegraph-signal based schemes with
applications such as circuits that emulate the brain's functioning and string
verification via a slow communication channel.Comment: Invited pape
Information Processing Capability of Soft Continuum Arms
Soft Continuum arms, such as trunk and tentacle robots, can be considered as
the "dual" of traditional rigid-bodied robots in terms of manipulability,
degrees of freedom, and compliance. Introduced two decades ago, continuum arms
have not yet realized their full potential, and largely remain as laboratory
curiosities. The reasons for this lag rest upon their inherent physical
features such as high compliance which contribute to their complex control
problems that no research has yet managed to surmount. Recently, reservoir
computing has been suggested as a way to employ the body dynamics as a
computational resource toward implementing compliant body control. In this
paper, as a first step, we investigate the information processing capability of
soft continuum arms. We apply input signals of varying amplitude and bandwidth
to a soft continuum arm and generate the dynamic response for a large number of
trials. These data is aggregated and used to train the readout weights to
implement a reservoir computing scheme. Results demonstrate that the
information processing capability varies across input signal bandwidth and
amplitude. These preliminary results demonstrate that soft continuum arms have
optimal bandwidth and amplitude where one can implement reservoir computing.Comment: Submitted to 2019 IEEE International Conference on Soft Robotics
(RoboSoft 2019
Performance Comparison on Parallel CPU and GPU Algorithms for Unified Gas-Kinetic Scheme
Parallel algorithms on CPU and GPU are implemented for the Unified
Gas-Kinetic Scheme and their performances are investigated and compared by a
two dimensional channel flow case. The parallel CPU algorithm has a one
dimensional block partition that parallelizes only the spatial space. Due to
the intrinsic feature of the UGKS, a compromised two-level parallelization is
adopted for GPU algorithm. A series of meshes with different sizes are tested
to reveal the performance evolution of the algorithms with respect to problem
size. Then special attentions are paid to UGKS applications where the molecular
velocity space range is large. The comparison confirms that GPU has relative
elevated accelerations with the latest device having a speedup of 118.38x.
Parallel CPU algorithm, on the contrary, might provide better performances when
the grid point number in velocity space is large.Comment: UGKS, GPU acceleration, parallel algorithm, performance compariso
FogStore: Toward a Distributed Data Store for Fog Computing
Stateful applications and virtualized network functions (VNFs) can benefit
from state externalization to increase their reliability, scalability, and
inter-operability. To keep and share the externalized state, distributed data
stores (DDSs) are a powerful tool allowing for the management of classical
trade-offs in consistency, availability and partitioning tolerance. With the
advent of Fog and Edge Computing, stateful applications and VNFs are pushed
from the data centers toward the network edge. This poses new challenges on
DDSs that are tailored to a deployment in Cloud data centers. In this paper, we
propose two novel design goals for DDSs that are tailored to Fog Computing: (1)
Fog-aware replica placement, and (2) context-sensitive differential
consistency. To realize those design goals on top of existing DDSs, we propose
the FogStore system. FogStore manages the needed adaptations in replica
placement and consistency management transparently, so that existing DDSs can
be plugged into the system. To show the benefits of FogStore, we perform a set
of evaluations using the Yahoo Cloud Serving Benchmark.Comment: To appear in Proceedings of 2017 IEEE Fog World Congress (FWC '17
A 3D radiative transfer framework: XIII. OpenCL implementation
We discuss an implementation of our 3D radiative transfer (3DRT) framework
with the OpenCL paradigm for general GPU computing. We implement the kernel for
solving the 3DRT problem in Cartesian coordinates with periodic boundary
conditions in the horizontal plane, including the construction of the
nearest neighbor \Lstar and the operator splitting step. We present the
results of a small and a large test case and compare the timing of the 3DRT
calculations for serial CPUs and various GPUs. The latest available GPUs can
lead to significant speedups for both small and large grids compared to serial
(single core) computations.Comment: A&A, in pres
A Probabilistic Design Method for Fatigue Life of Metallic Component
In the present study, a general probabilistic design framework is developed
for cyclic fatigue life prediction of metallic hardware using methods that
address uncertainty in experimental data and computational model. The
methodology involves (i) fatigue test data conducted on coupons of Ti6Al4V
material (ii) continuum damage mechanics based material constitutive models to
simulate cyclic fatigue behavior of material (iii) variance-based global
sensitivity analysis (iv) Bayesian framework for model calibration and
uncertainty quantification and (v) computational life prediction and
probabilistic design decision making under uncertainty. The outcomes of
computational analyses using the experimental data prove the feasibility of the
probabilistic design methods for model calibration in presence of incomplete
and noisy data. Moreover, using probabilistic design methods result in
assessment of reliability of fatigue life predicted by computational models
Digital Shearlet Transform
Over the past years, various representation systems which sparsely
approximate functions governed by anisotropic features such as edges in images
have been proposed. We exemplarily mention the systems of contourlets,
curvelets, and shearlets. Alongside the theoretical development of these
systems, algorithmic realizations of the associated transforms were provided.
However, one of the most common shortcomings of these frameworks is the lack of
providing a unified treatment of the continuum and digital world, i.e.,
allowing a digital theory to be a natural digitization of the continuum theory.
In fact, shearlet systems are the only systems so far which satisfy this
property, yet still deliver optimally sparse approximations of cartoon-like
images. In this chapter, we provide an introduction to digital shearlet theory
with a particular focus on a unified treatment of the continuum and digital
realm. In our survey we will present the implementations of two shearlet
transforms, one based on band-limited shearlets and the other based on
compactly supported shearlets. We will moreover discuss various quantitative
measures, which allow an objective comparison with other directional transforms
and an objective tuning of parameters. The codes for both presented transforms
as well as the framework for quantifying performance are provided in the Matlab
toolbox ShearLab.Comment: arXiv admin note: substantial text overlap with arXiv:1106.205
Accelerating High-Strain Continuum-Scale Brittle Fracture Simulations with Machine Learning
Failure in brittle materials under dynamic loading conditions is a result of
the propagation and coalescence of microcracks. Simulating this mechanism at
the continuum level is computationally expensive or, in some cases,
intractable. The computational cost is due to the need for highly resolved
computational meshes required to capture complex crack growth behavior, such as
branching, turning, etc. Typically, continuum-scale models that account for
brittle damage evolution homogenize the crack network in some way, which
reduces the overall computational cost, but can also neglect key physics of the
subgrid crack growth behavior, sacrificing accuracy for efficiency. We have
developed an approach using machine learning that overcomes the current
inability to represent micro-scale physics at the macro-scale. Our approach
leverages damage and stress data from a high-fidelity model that explicitly
resolves microcrack behavior to build an inexpensive machine learning emulator,
which runs in seconds as opposed to the high-fidelity model, which takes hours.
Once trained, the machine learning emulator is used to predict the evolution of
crack length statistics. A continuum-scale constitutive model is then informed
with these crack statistics, speeding up the workflow by four orders of
magnitude. Both the machine learning model and the continuum-scale model are
validated against a high-fidelity model and experimental data, respectively,
showing excellent agreement. There are two key findings. The first is that we
can reduce the dimensionality of the problem, establishing that the machine
learning emulator only needs the length of the longest crack and one of the
maximum stress components to capture the necessary physics. Another compelling
finding is that the emulator can be trained in one experimental setting and
transferred successfully to predict behavior in a different setting.Comment: Keywords: Computational Material Science, Machine Learning. 27
pages,13 figures, in review at COMMAT Elsevier journa
Near-optimal Smooth Path Planning for Multisection Continuum Arms
We study the path planning problem for continuum-arm robots, in which we are
given a starting and an end point, and we need to compute a path for the tip of
the continuum arm between the two points. We consider both cases where
obstacles are present and where they are not. We demonstrate how to leverage
the continuum arm features to introduce a new model that enables a path
planning approach based on the configurations graph, for a continuum arm
consisting of three sections, each consisting of three muscle actuators. The
algorithm we apply to the configurations graph allows us to exploit parallelism
in the computation to obtain efficient implementation. We conducted extensive
tests, and the obtained results show the completeness of the proposed algorithm
under the considered discretizations, in both cases where obstacles are present
and where they are not. We compared our approach to the standard inverse
kinematics approach. While the inverse kinematics approach is much faster when
successful, our algorithm always succeeds in finding a path or reporting that
no path exists, compared to a roughly 70% success rate of the inverse
kinematics approach (when a path exists).Comment: Submitted to 2019 IEEE International Conference on Soft Robotics
(RoboSoft 2019
- …