5,360 research outputs found
An analysis of the practical DPG method
In this work we give a complete error analysis of the Discontinuous Petrov
Galerkin (DPG) method, accounting for all the approximations made in its
practical implementation. Specifically, we consider the DPG method that uses a
trial space consisting of polynomials of degree on each mesh element.
Earlier works showed that there is a "trial-to-test" operator , which when
applied to the trial space, defines a test space that guarantees stability. In
DPG formulations, this operator is local: it can be applied
element-by-element. However, an infinite dimensional problem on each mesh
element needed to be solved to apply . In practical computations, is
approximated using polynomials of some degree on each mesh element. We
show that this approximation maintains optimal convergence rates, provided that
, where is the space dimension (two or more), for the Laplace
equation. We also prove a similar result for the DPG method for linear
elasticity. Remarks on the conditioning of the stiffness matrix in DPG methods
are also included.Comment: Mathematics of Computation, 201
Partial expansion of a Lipschitz domain and some applications
We show that a Lipschitz domain can be expanded solely near a part of its
boundary, assuming that the part is enclosed by a piecewise C1 curve. The
expanded domain as well as the extended part are both Lipschitz. We apply this
result to prove a regular decomposition of standard vector Sobolev spaces with
vanishing traces only on part of the boundary. Another application in the
construction of low-regularity projectors into finite element spaces with
partial boundary conditions is also indicated
Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-tuning
Deep neural networks require a large amount of labeled training data during
supervised learning. However, collecting and labeling so much data might be
infeasible in many cases. In this paper, we introduce a source-target selective
joint fine-tuning scheme for improving the performance of deep learning tasks
with insufficient training data. In this scheme, a target learning task with
insufficient training data is carried out simultaneously with another source
learning task with abundant training data. However, the source learning task
does not use all existing training data. Our core idea is to identify and use a
subset of training images from the original source learning task whose
low-level characteristics are similar to those from the target learning task,
and jointly fine-tune shared convolutional layers for both tasks. Specifically,
we compute descriptors from linear or nonlinear filter bank responses on
training images from both tasks, and use such descriptors to search for a
desired subset of training samples for the source learning task.
Experiments demonstrate that our selective joint fine-tuning scheme achieves
state-of-the-art performance on multiple visual classification tasks with
insufficient training data for deep learning. Such tasks include Caltech 256,
MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to
fine-tuning without a source domain, the proposed method can improve the
classification accuracy by 2% - 10% using a single model.Comment: To appear in 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017
A first order system least squares method for the Helmholtz equation
We present a first order system least squares (FOSLS) method for the
Helmholtz equation at high wave number k, which always deduces Hermitian
positive definite algebraic system. By utilizing a non-trivial solution
decomposition to the dual FOSLS problem which is quite different from that of
standard finite element method, we give error analysis to the hp-version of the
FOSLS method where the dependence on the mesh size h, the approximation order
p, and the wave number k is given explicitly. In particular, under some
assumption of the boundary of the domain, the L2 norm error estimate of the
scalar solution from the FOSLS method is shown to be quasi optimal under the
condition that kh/p is sufficiently small and the polynomial degree p is at
least O(\log k). Numerical experiments are given to verify the theoretical
results
Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors
Sparse matrix-vector multiplication (SpMV) is a central building block for
scientific software and graph applications. Recently, heterogeneous processors
composed of different types of cores attracted much attention because of their
flexible core configuration and high energy efficiency. In this paper, we
propose a compressed sparse row (CSR) format based SpMV algorithm utilizing
both types of cores in a CPU-GPU heterogeneous processor. We first
speculatively execute segmented sum operations on the GPU part of a
heterogeneous processor and generate a possibly incorrect results. Then the CPU
part of the same chip is triggered to re-arrange the predicted partial sums for
a correct resulting vector. On three heterogeneous processors from Intel, AMD
and nVidia, using 20 sparse matrices as a benchmark suite, the experimental
results show that our method obtains significant performance improvement over
the best existing CSR-based SpMV algorithms. The source code of this work is
downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO
Knowledge management, absorptive capacity and organisational culture: A case study from Chinese SMEs
Copyright © 2008 Inderscience Enterprises Ltd. This is the author's accepted manuscript. The final published article is available from the link below.Based on the analysis of an innovative medium sized enterprise from mainland China, this paper investigated the Knowledge Management (KM) issues by focusing on its KM enablers and process. This paper attempts to investigate how Chinese enterprises absorb knowledge from external sources; how they developed culture to facilitate Knowledge Management Processes (KMPs) and what major challenges they raise for the future by looking at the case study of a Chinese Small and Medium-sized Enterprises (SMEs). The case study indicates that Chinese enterprises emphasised knowledge acquisition and the capacities of knowledge absorption, application, creation, sharing and integration as vital to sustaining competitive advantage for these firms. Corporative organisational culture also has significant impact on the KM in those enterprises
CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication
Sparse matrix-vector multiplication (SpMV) is a fundamental building block
for numerous applications. In this paper, we propose CSR5 (Compressed Sparse
Row 5), a new storage format, which offers high-throughput SpMV on various
platforms including CPUs, GPUs and Xeon Phi. First, the CSR5 format is
insensitive to the sparsity structure of the input matrix. Thus the single
format can support an SpMV algorithm that is efficient both for regular
matrices and for irregular matrices. Furthermore, we show that the overhead of
the format conversion from the CSR to the CSR5 can be as low as the cost of a
few SpMV operations. We compare the CSR5-based SpMV algorithm with 11
state-of-the-art formats and algorithms on four mainstream processors using 14
regular and 10 irregular matrices as a benchmark suite. For the 14 regular
matrices in the suite, we achieve comparable or better performance over the
previous work. For the 10 irregular matrices, the CSR5 obtains average
performance improvement of 17.6\%, 28.5\%, 173.0\% and 293.3\% (up to 213.3\%,
153.6\%, 405.1\% and 943.3\%) over the best existing work on dual-socket Intel
CPUs, an nVidia GPU, an AMD GPU and an Intel Xeon Phi, respectively. For
real-world applications such as a solver with only tens of iterations, the CSR5
format can be more practical because of its low-overhead for format conversion.
The source code of this work is downloadable at
https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5Comment: 12 pages, 10 figures, In Proceedings of the 29th ACM International
Conference on Supercomputing (ICS '15
- …