950 research outputs found

    Beyond spatial scalability limitations with a massively parallel method for linear oscillatory problems

    Get PDF
    This is the author accepted manuscript. The final version is available from SAGE Publications via the DOI in this record.This paper presents, discusses and analyses a massively parallel-in-time solver for linear oscillatory PDEs, which is a key numerical component for evolving weather, ocean, climate and seismic models. The time parallelization in this solver allows us to significantly exceed the computing resources used by parallelization-in-space methods and results in a correspondingly significantly reduced wall-clock time. One of the major difficulties of achieving Exascale performance for weather prediction is that the strong scaling limit – the parallel performance for a fixed problem size with an increasing number of processors – saturates. A main avenue to circumvent this problem is to introduce new numerical techniques that take advantage of time parallelism. In this paper we use a time-parallel approximation that retains the frequency information of oscillatory problems. This approximation is based on (a) reformulating the original problem into a large set of independent terms and (b) solving each of these terms independently of each other which can now be accomplished on a large number of HPC resources. Our results are conducted on up to 3586 cores for problem sizes with the parallelization-in-space scalability limited already on a single node. We gain significant reductions in the time-to-solution of 118.3 for spectral methods and 1503.0 for finite-difference methods with the parallelizationin-time approach. A developed and calibrated performance model gives the scalability limitations a-priory for this new approach and allows us to extrapolate the performance method towards large-scale system. This work has the potential to contribute as a basic building block of parallelization-in-time approaches, with possible major implications in applied areas modelling oscillatory dominated problems.The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, www.lrz. de). We also acknowledge use of Hartree Centre resources in this work on which the early evaluation of the parallelization concepts were done

    Meniscal tissue explants response depends on level of dynamic compressive strain

    Get PDF
    SummaryObjectiveFollowing partial meniscectomy, the remaining meniscus is exposed to an altered loading environment. In vitro 20% dynamic compressive strains on meniscal tissue explants has been shown to lead to an increase in release of glycosaminoglycans from the tissue and increased expression of interleukin-1α (IL-1α). The goal of this study was to determine if compressive loading which induces endogenously expressed IL-1 results in downstream changes in gene expression of anabolic and catabolic molecules in meniscal tissue, such as MMP expression.MethodRelative changes in gene expression of MMP-1, MMP-3, MMP-9, MMP-13, A Disintegrin and Metalloproteinase with ThromboSpondin 4 (ADAMTS4), ADAMTS5, TNFα, TGFβ, COX-2, Type I collagen (COL-1) and aggrecan and subsequent changes in the concentration of prostaglandin E2 released by meniscal tissue in response to varying levels of dynamic compression (0%, 10%, and 20%) were measured. Porcine meniscal explants were dynamically compressed for 2h at 1Hz.Results20% dynamic compressive strains upregulated MMP-1, MMP-3, MMP-13 and ADAMTS4 compared to no dynamic loading. Aggrecan, COX-2, and ADAMTS5 gene expression were upregulated under 10% strain compared to no dynamic loading while COL-1, TIMP-1, and TGFβ gene expression were not dependent on the magnitude of loading.ConclusionThis data suggests that changes in mechanical loading of the knee joint meniscus from 10% to 20% dynamic strain can increase the catabolic activity of the meniscus

    Inference in supervised spectral classifiers for on-board hyperspectral imaging: An overview

    Get PDF
    Machine learning techniques are widely used for pixel-wise classification of hyperspectral images. These methods can achieve high accuracy, but most of them are computationally intensive models. This poses a problem for their implementation in low-power and embedded systems intended for on-board processing, in which energy consumption and model size are as important as accuracy. With a focus on embedded anci on-board systems (in which only the inference step is performed after an off-line training process), in this paper we provide a comprehensive overview of the inference properties of the most relevant techniques for hyperspectral image classification. For this purpose, we compare the size of the trained models and the operations required during the inference step (which are directly related to the hardware and energy requirements). Our goal is to search for appropriate trade-offs between on-board implementation (such as model size anci energy consumption) anci classification accuracy

    GPU Parallel Implementation of Dual-Depth Sparse Probabilistic Latent Semantic Analysis for Hyperspectral Unmixing

    Get PDF
    Hyperspectral unmixing (HU) is an important task for remotely sensed hyperspectral (HS) data exploitation. It comprises the identification of pure spectral signatures (endmembers) and their corresponding fractional abundances in each pixel of the HS data cube. Several methods have been developed for (semi-) supervised and automatic identification of endmembers and abundances. Recently, the statistical dual-depth sparse probabilistic latent semantic analysis (DEpLSA) method has been developed to tackle the HU problem as a latent topic-based approach in which both endmembers and abundances can be simultaneously estimated according to the semantics encapsulated by the latent topic space. However, statistical models usually lead to computationally demanding algorithms and the computational time of the DEpLSA is often too high for practical use, in particular, when the dimensionality of the HS data cube is large. In order to mitigate this limitation, this article resorts to graphical processing units (GPUs) to provide a new parallel version of the DEpLSA, developed using the NVidia compute device unified architecture. Our experimental results, conducted using four well-known HS datasets and two different GPU architectures (GTX 1080 and Tesla P100), show that our parallel versions of the DEpLSA and the traditional pLSA approach can provide accurate HU results fast enough for practical use, accelerating the corresponding serial versions in at least 30x in the GTX 1080 and up to 147x in the Tesla P100 GPU, which are quite significant acceleration factors that increase with the image size, thus allowing for the possibility of the fast processing of massive HS data repositories

    GPU-friendly neural networks for remote sensing scene classification

    Get PDF
    Convolutional neural networks (CNNs) have proven to be very efficient for the analysis of remote sensing (RS) images. Due to the inherent complexity of extracting features from these images, along with the increasing amount of data to be processed (and the diversity of applications), there is a clear tendency to develop and employ increasingly deep and complex CNNs. In this regard, graphics processing units (GPUs) are frequently used to optimize their execution, both for the training and inference stages, optimizing the performance of neural models through their many-core architecture. Hence, the efficient use of the GPU resources should be at the core of optimizations. This letter analyzes the possibilities of using a new family of CNNs, denoted as TResNets, to provide an efficient solution to the RS scene classification problem. Moreover, the considered models have been combined with mixed precision to enhance their training performance. Our experimental results, conducted over three publicly available RS data sets, show that the proposed networks achieve better accuracy and more efficient use of GPU resources than other state-of-the-art networks. Source code is available at https://github.com/mhaut/GPUfriendlyRS

    Low-High-Power Consumption Architectures for Deep-Learning Models Applied to Hyperspectral Image Classification

    Get PDF
    Convolutional neural networks have emerged as an excellent tool for remotely sensed hyperspectral image (HSI) classification. Nonetheless, the high computational complexity and energy requirements of these models typically limit their application in on-board remote sensing scenarios. In this context, low-power consumption architectures are promising platforms that may provide acceptable on-board computing capabilities to achieve satisfactory classification results with reduced energy demand. For instance, the new NVIDIA Jetson Tegra TX2 device is an efficient solution for on-board processing applications using deep-learning (DL) approaches. So far, very few efforts have been devoted to exploiting this or other similar computing platforms in on-board remote sensing procedures. This letter explores the use of low-power consumption architectures and DL algorithms for HSI classification. The conducted experimental study reveals that the NVIDIA Jetson Tegra TX2 device offers a good choice in terms of performance, cost, and energy consumption for on-board HSI classification tasks

    Deep Pyramidal Residual Networks for Spectral-Spatial Hyperspectral Image Classification

    Get PDF
    Convolutional neural networks (CNNs) exhibit good performance in image processing tasks, pointing themselves as the current state-of-the-art of deep learning methods. However, the intrinsic complexity of remotely sensed hyperspectral images still limits the performance of many CNN models. The high dimensionality of the HSI data, together with the underlying redundancy and noise, often makes the standard CNN approaches unable to generalize discriminative spectral-spatial features. Moreover, deeper CNN architectures also find challenges when additional layers are added, which hampers the network convergence and produces low classification accuracies. In order to mitigate these issues, this paper presents a new deep CNN architecture specially designed for the HSI data. Our new model pursues to improve the spectral-spatial features uncovered by the convolutional filters of the network. Specifically, the proposed residual-based approach gradually increases the feature map dimension at all convolutional layers, grouped in pyramidal bottleneck residual blocks, in order to involve more locations as the network depth increases while balancing the workload among all units, preserving the time complexity per layer. It can be seen as a pyramid, where the deeper the blocks, the more feature maps can be extracted. Therefore, the diversity of high-level spectral-spatial attributes can be gradually increased across layers to enhance the performance of the proposed network with the HSI data. Our experiments, conducted using four well-known HSI data sets and 10 different classification techniques, reveal that our newly developed HSI pyramidal residual model is able to provide competitive advantages (in terms of both classification accuracy and computational time) over the state-of-the-art HSI classification methods
    corecore