5,360 research outputs found

    An analysis of the practical DPG method

    Full text link
    In this work we give a complete error analysis of the Discontinuous Petrov Galerkin (DPG) method, accounting for all the approximations made in its practical implementation. Specifically, we consider the DPG method that uses a trial space consisting of polynomials of degree pp on each mesh element. Earlier works showed that there is a "trial-to-test" operator TT, which when applied to the trial space, defines a test space that guarantees stability. In DPG formulations, this operator TT is local: it can be applied element-by-element. However, an infinite dimensional problem on each mesh element needed to be solved to apply TT. In practical computations, TT is approximated using polynomials of some degree r>pr > p on each mesh element. We show that this approximation maintains optimal convergence rates, provided that r≥p+Nr\ge p+N, where NN is the space dimension (two or more), for the Laplace equation. We also prove a similar result for the DPG method for linear elasticity. Remarks on the conditioning of the stiffness matrix in DPG methods are also included.Comment: Mathematics of Computation, 201

    Partial expansion of a Lipschitz domain and some applications

    Get PDF
    We show that a Lipschitz domain can be expanded solely near a part of its boundary, assuming that the part is enclosed by a piecewise C1 curve. The expanded domain as well as the extended part are both Lipschitz. We apply this result to prove a regular decomposition of standard vector Sobolev spaces with vanishing traces only on part of the boundary. Another application in the construction of low-regularity projectors into finite element spaces with partial boundary conditions is also indicated

    Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-tuning

    Get PDF
    Deep neural networks require a large amount of labeled training data during supervised learning. However, collecting and labeling so much data might be infeasible in many cases. In this paper, we introduce a source-target selective joint fine-tuning scheme for improving the performance of deep learning tasks with insufficient training data. In this scheme, a target learning task with insufficient training data is carried out simultaneously with another source learning task with abundant training data. However, the source learning task does not use all existing training data. Our core idea is to identify and use a subset of training images from the original source learning task whose low-level characteristics are similar to those from the target learning task, and jointly fine-tune shared convolutional layers for both tasks. Specifically, we compute descriptors from linear or nonlinear filter bank responses on training images from both tasks, and use such descriptors to search for a desired subset of training samples for the source learning task. Experiments demonstrate that our selective joint fine-tuning scheme achieves state-of-the-art performance on multiple visual classification tasks with insufficient training data for deep learning. Such tasks include Caltech 256, MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to fine-tuning without a source domain, the proposed method can improve the classification accuracy by 2% - 10% using a single model.Comment: To appear in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017

    A first order system least squares method for the Helmholtz equation

    Full text link
    We present a first order system least squares (FOSLS) method for the Helmholtz equation at high wave number k, which always deduces Hermitian positive definite algebraic system. By utilizing a non-trivial solution decomposition to the dual FOSLS problem which is quite different from that of standard finite element method, we give error analysis to the hp-version of the FOSLS method where the dependence on the mesh size h, the approximation order p, and the wave number k is given explicitly. In particular, under some assumption of the boundary of the domain, the L2 norm error estimate of the scalar solution from the FOSLS method is shown to be quasi optimal under the condition that kh/p is sufficiently small and the polynomial degree p is at least O(\log k). Numerical experiments are given to verify the theoretical results

    Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

    Full text link
    Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO

    Knowledge management, absorptive capacity and organisational culture: A case study from Chinese SMEs

    Get PDF
    Copyright © 2008 Inderscience Enterprises Ltd. This is the author's accepted manuscript. The final published article is available from the link below.Based on the analysis of an innovative medium sized enterprise from mainland China, this paper investigated the Knowledge Management (KM) issues by focusing on its KM enablers and process. This paper attempts to investigate how Chinese enterprises absorb knowledge from external sources; how they developed culture to facilitate Knowledge Management Processes (KMPs) and what major challenges they raise for the future by looking at the case study of a Chinese Small and Medium-sized Enterprises (SMEs). The case study indicates that Chinese enterprises emphasised knowledge acquisition and the capacities of knowledge absorption, application, creation, sharing and integration as vital to sustaining competitive advantage for these firms. Corporative organisational culture also has significant impact on the KM in those enterprises

    CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

    Full text link
    Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, GPUs and Xeon Phi. First, the CSR5 format is insensitive to the sparsity structure of the input matrix. Thus the single format can support an SpMV algorithm that is efficient both for regular matrices and for irregular matrices. Furthermore, we show that the overhead of the format conversion from the CSR to the CSR5 can be as low as the cost of a few SpMV operations. We compare the CSR5-based SpMV algorithm with 11 state-of-the-art formats and algorithms on four mainstream processors using 14 regular and 10 irregular matrices as a benchmark suite. For the 14 regular matrices in the suite, we achieve comparable or better performance over the previous work. For the 10 irregular matrices, the CSR5 obtains average performance improvement of 17.6\%, 28.5\%, 173.0\% and 293.3\% (up to 213.3\%, 153.6\%, 405.1\% and 943.3\%) over the best existing work on dual-socket Intel CPUs, an nVidia GPU, an AMD GPU and an Intel Xeon Phi, respectively. For real-world applications such as a solver with only tens of iterations, the CSR5 format can be more practical because of its low-overhead for format conversion. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5Comment: 12 pages, 10 figures, In Proceedings of the 29th ACM International Conference on Supercomputing (ICS '15
    • …
    corecore