46,250 research outputs found
An Alternating Trust Region Algorithm for Distributed Linearly Constrained Nonlinear Programs, Application to the AC Optimal Power Flow
A novel trust region method for solving linearly constrained nonlinear
programs is presented. The proposed technique is amenable to a distributed
implementation, as its salient ingredient is an alternating projected gradient
sweep in place of the Cauchy point computation. It is proven that the algorithm
yields a sequence that globally converges to a critical point. As a result of
some changes to the standard trust region method, namely a proximal
regularisation of the trust region subproblem, it is shown that the local
convergence rate is linear with an arbitrarily small ratio. Thus, convergence
is locally almost superlinear, under standard regularity assumptions. The
proposed method is successfully applied to compute local solutions to
alternating current optimal power flow problems in transmission and
distribution networks. Moreover, the new mechanism for computing a Cauchy point
compares favourably against the standard projected search as for its activity
detection properties
GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics
We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh
Refinement code), which has adopted a novel approach to improve the performance
of adaptive mesh refinement (AMR) astrophysical simulations by a large factor
with the use of the graphic processing unit (GPU). The AMR implementation is
based on a hierarchy of grid patches with an oct-tree data structure. We adopt
a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a
multi-level relaxation scheme for the Poisson solver. Both solvers have been
implemented in GPU, by which hundreds of patches can be advanced in parallel.
The computational overhead associated with the data transfer between CPU and
GPU is carefully reduced by utilizing the capability of asynchronous memory
copies in GPU, and the computing time of the ghost-zone values for each patch
is made to diminish by overlapping it with the GPU computations. We demonstrate
the accuracy of the code by performing several standard test problems in
astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster
system. We measure the performance of the code by performing purely-baryonic
cosmological simulations in different hardware implementations, in which
detailed timing analyses provide comparison between the computations with and
without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are
demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with
8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included.
Accepted for publication in ApJ
A scalable parallel finite element framework for growing geometries. Application to metal additive manufacturing
This work introduces an innovative parallel, fully-distributed finite element
framework for growing geometries and its application to metal additive
manufacturing. It is well-known that virtual part design and qualification in
additive manufacturing requires highly-accurate multiscale and multiphysics
analyses. Only high performance computing tools are able to handle such
complexity in time frames compatible with time-to-market. However, efficiency,
without loss of accuracy, has rarely held the centre stage in the numerical
community. Here, in contrast, the framework is designed to adequately exploit
the resources of high-end distributed-memory machines. It is grounded on three
building blocks: (1) Hierarchical adaptive mesh refinement with octree-based
meshes; (2) a parallel strategy to model the growth of the geometry; (3)
state-of-the-art parallel iterative linear solvers. Computational experiments
consider the heat transfer analysis at the part scale of the printing process
by powder-bed technologies. After verification against a 3D benchmark, a
strong-scaling analysis assesses performance and identifies major sources of
parallel overhead. A third numerical example examines the efficiency and
robustness of (2) in a curved 3D shape. Unprecedented parallelism and
scalability were achieved in this work. Hence, this framework contributes to
take on higher complexity and/or accuracy, not only of part-scale simulations
of metal or polymer additive manufacturing, but also in welding, sedimentation,
atherosclerosis, or any other physical problem where the physical domain of
interest grows in time
Value-Flow-Based Demand-Driven Pointer Analysis for C and C++
IEEE We present SUPA, a value-flow-based demand-driven flow- and context-sensitive pointer analysis with strong updates for C and C++ programs. SUPA enables computing points-to information via value-flow refinement, in environments with small time and memory budgets. We formulate SUPA by solving a graph-reachability problem on an inter-procedural value-flow graph representing a program's def-use chains, which are pre-computed efficiently but over-approximately. To answer a client query (a request for a variable's points-to set), SUPA reasons about the flow of values along the pre-computed def-use chains sparsely (rather than across all program points), by performing only the work necessary for the query (rather than analyzing the whole program). In particular, strong updates are performed to filter out spurious def-use chains through value-flow refinement as long as the total budget is not exhausted
Estimating Local Function Complexity via Mixture of Gaussian Processes
Real world data often exhibit inhomogeneity, e.g., the noise level, the
sampling distribution or the complexity of the target function may change over
the input space. In this paper, we try to isolate local function complexity in
a practical, robust way. This is achieved by first estimating the locally
optimal kernel bandwidth as a functional relationship. Specifically, we propose
Spatially Adaptive Bandwidth Estimation in Regression (SABER), which employs
the mixture of experts consisting of multinomial kernel logistic regression as
a gate and Gaussian process regression models as experts. Using the locally
optimal kernel bandwidths, we deduce an estimate to the local function
complexity by drawing parallels to the theory of locally linear smoothing. We
demonstrate the usefulness of local function complexity for model
interpretation and active learning in quantum chemistry experiments and fluid
dynamics simulations.Comment: 19 pages, 16 figure
Graphical Models for Optimal Power Flow
Optimal power flow (OPF) is the central optimization problem in electric
power grids. Although solved routinely in the course of power grid operations,
it is known to be strongly NP-hard in general, and weakly NP-hard over tree
networks. In this paper, we formulate the optimal power flow problem over tree
networks as an inference problem over a tree-structured graphical model where
the nodal variables are low-dimensional vectors. We adapt the standard dynamic
programming algorithm for inference over a tree-structured graphical model to
the OPF problem. Combining this with an interval discretization of the nodal
variables, we develop an approximation algorithm for the OPF problem. Further,
we use techniques from constraint programming (CP) to perform interval
computations and adaptive bound propagation to obtain practically efficient
algorithms. Compared to previous algorithms that solve OPF with optimality
guarantees using convex relaxations, our approach is able to work for arbitrary
distribution networks and handle mixed-integer optimization problems. Further,
it can be implemented in a distributed message-passing fashion that is scalable
and is suitable for "smart grid" applications like control of distributed
energy resources. We evaluate our technique numerically on several benchmark
networks and show that practical OPF problems can be solved effectively using
this approach.Comment: To appear in Proceedings of the 22nd International Conference on
Principles and Practice of Constraint Programming (CP 2016
- …