7 research outputs found

    Accelerated Synchronous Model Parallelism Using Cooperative Process for Training Compute-Intensive Models

    No full text
    As deep learning has been recently applied to a wide variety of fields, there is a growing demand for models that can handle large input data, including high-resolution images. Therefore, model parallelism was proposed to train a model whose size exceeds the memory capacity of an accelerator, but this method has a very slow training speed due to bubbles. To solve this problem, GPipe reduced the bubbles by proposing a micro-batch concept, according to which a mini-batch is divided into smaller units. However, the improvement in the training speed is still limited to a certain extent of micro-batch size, because the smaller the micro-batches, the less the computation and input/output (I/O) efficiency. To overcome the limitations, we proposed acceleration through prediction and synchronization steps based on process cooperation to train compute-intensive models. In the prediction step, the input data of all processes for the forward pass are calculated concurrently using the weights shared in the synchronization step in advance, and the results are gathered into each corresponding process via an all-to-all collective operation. This can increase computational efficiency and reduce bubbles by minimizing the idle state of the device. Additionally, the proposed method requires minimal memory because it does not have to store the activations in memory. Thus, compared to the GPipe, the proposed method achieved performance improvements of 15.3%, 34.5%, and 25.8% with the VGG16bn, ResNet50, and InceptionV3 models with four devices, respectively, and the memory used for training was reduced by up to 75.0%

    Outlier-Aware Demand Prediction Using Recurrent Neural Network-Based Models and Statistical Approach

    No full text
    The paint industry comprises an elaborate supply chain involving various activities such as raw material procurement, manufacturing, and distribution. In addition, the accuracy of demand prediction significantly impacts supply chain management. A recurrent neural network (RNN) is a powerful method that learns intricate patterns through vast amounts of historical data and provides a prediction, demonstrating excellent performance in demand prediction. However, standard RNN-based demand predictions are limited by many outliers that occur due to the characteristics of the paint industry. Unexpected events such as a factory fire cause rapid fluctuations in paint demand and are difficult to pre-observe. To overcome these limitations, we propose a novel approach that employs clustering to identify demand time series data with similar characteristics and applies statistical outlier adjustment to allow the prediction model to learn the complex patterns of actual demand. The prediction target is the sum of sales for 15 days in the future, and sales data for four paint products are used to evaluate the proposed approach. Experimental results demonstrate a satisfactory prediction accuracy improvement ranging from 100.9% to 152.4%. In addition, unlike the RNN-based models used for comparison, the proposed method is more robust against actual demand fluctuations

    TA-DARTS: Temperature Annealing of Discrete Operator Distribution for Effective Differential Architecture Search

    No full text
    In the realm of machine learning, the optimization of hyperparameters and the design of neural architectures entail laborious and time-intensive endeavors. To address these challenges, considerable research effort has been directed towards Automated Machine Learning (AutoML), with a focus on enhancing these inherent inefficiencies. A pivotal facet of this pursuit is Neural Architecture Search (NAS), a domain dedicated to the automated formulation of neural network architectures. Given the pronounced impact of network architecture on neural network performance, NAS techniques strive to identify architectures that can manifest optimal performance outcomes. A prominent algorithm in this area is Differentiable Architecture Search (DARTS), which transforms discrete search spaces into continuous counterparts using gradient-based methodologies, thereby surpassing prior NAS methodologies. Notwithstanding DARTS’ achievements, a discrepancy between discrete and continuously encoded architectures persists. To ameliorate this disparity, we propose TA-DARTS in this study—a temperature annealing technique applied to the Softmax function, utilized for encoding the continuous search space. By leveraging temperature values, architectural weights are judiciously adjusted to alleviate biases in the search process or to align resulting architectures more closely with discrete values. Our findings exhibit advancements over the original DARTS methodology, evidenced by a 0.07%p enhancement in validation accuracy and a 0.16%p improvement in test accuracy on the CIFAR-100 dataset. Through systematic experimentation on benchmark datasets, we establish the superiority of TA-DARTS over the original mixed operator, thereby underscoring its efficacy in automating neural architecture design

    Accelerating a cross-correlation score function to search modifications using a single GPU

    No full text
    Abstract Background A cross-correlation (XCorr) score function is one of the most popular score functions utilized to search peptide identifications in databases, and many computer programs, such as SEQUEST, Comet, and Tide, currently use this score function. Recently, the HiXCorr algorithm was developed to speed up this score function for high-resolution spectra by improving the preprocessing step of the tandem mass spectra. However, despite the development of the HiXCorr algorithm, the score function is still slow because candidate peptides increase when post-translational modifications (PTMs) are considered in the search. Results We used a graphics processing unit (GPU) to develop the accelerating score function derived by combining Tide’s XCorr score function and the HiXCorr algorithm. Our method is 2.7 and 5.8 times faster than the original Tide and Tide-Hi, respectively, for 50 Da precursor tolerance. Our GPU-based method produced identical scores as did the CPU-based Tide and Tide-Hi. Conclusion We propose the accelerating score function to search modifications using a single GPU. The software is available at https://github.com/Tide-for-PTM-search/Tide-for-PTM-search

    Cancelled: A New Efficient Resource Management Framework for Iterative MapReduce Processing in Large-Scale Data Analysis

    No full text
    corecore