296 research outputs found

    DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks

    Full text link
    The increasing inclusion of Deep Learning (DL) models in safety-critical systems such as autonomous vehicles have led to the development of multiple model-based DL testing techniques. One common denominator of these testing techniques is the automated generation of test cases, e.g., new inputs transformed from the original training data with the aim to optimize some test adequacy criteria. So far, the effectiveness of these approaches has been hindered by their reliance on random fuzzing or transformations that do not always produce test cases with a good diversity. To overcome these limitations, we propose, DeepEvolution, a novel search-based approach for testing DL models that relies on metaheuristics to ensure a maximum diversity in generated test cases. We assess the effectiveness of DeepEvolution in testing computer-vision DL models and found that it significantly increases the neuronal coverage of generated test cases. Moreover, using DeepEvolution, we could successfully find several corner-case behaviors. Finally, DeepEvolution outperformed Tensorfuzz (a coverage-guided fuzzing tool developed at Google Brain) in detecting latent defects introduced during the quantization of the models. These results suggest that search-based approaches can help build effective testing tools for DL systems

    Traffic signal settings optimization using fradient descent

    Get PDF
    We investigate performance of a gradient descent optimization (GR) applied to the traffic signal setting problem and compare it to genetic algorithms. We used neural networks as metamodels evaluating quality of signal settings and discovered that both optimization methods produce similar results, e.g., in both cases the accuracy of neural networks close to local optima depends on an activation function (e.g., TANH activation makes optimization process converge to different minima than ReLU activation)

    DiverGet: A Search-Based Software Testing Approach for Deep Neural Network Quantization Assessment

    Full text link
    Quantization is one of the most applied Deep Neural Network (DNN) compression strategies, when deploying a trained DNN model on an embedded system or a cell phone. This is owing to its simplicity and adaptability to a wide range of applications and circumstances, as opposed to specific Artificial Intelligence (AI) accelerators and compilers that are often designed only for certain specific hardware (e.g., Google Coral Edge TPU). With the growing demand for quantization, ensuring the reliability of this strategy is becoming a critical challenge. Traditional testing methods, which gather more and more genuine data for better assessment, are often not practical because of the large size of the input space and the high similarity between the original DNN and its quantized counterpart. As a result, advanced assessment strategies have become of paramount importance. In this paper, we present DiverGet, a search-based testing framework for quantization assessment. DiverGet defines a space of metamorphic relations that simulate naturally-occurring distortions on the inputs. Then, it optimally explores these relations to reveal the disagreements among DNNs of different arithmetic precision. We evaluate the performance of DiverGet on state-of-the-art DNNs applied to hyperspectral remote sensing images. We chose the remote sensing DNNs as they're being increasingly deployed at the edge (e.g., high-lift drones) in critical domains like climate change research and astronomy. Our results show that DiverGet successfully challenges the robustness of established quantization techniques against naturally-occurring shifted data, and outperforms its most recent concurrent, DiffChaser, with a success rate that is (on average) four times higher.Comment: Accepted for publication in The Empirical Software Engineering Journal (EMSE

    An Improved Bees Algorithm for Training Deep Recurrent Networks for Sentiment Classification

    Get PDF
    Recurrent neural networks (RNNs) are powerful tools for learning information from temporal sequences. Designing an optimum deep RNN is difficult due to configuration and training issues, such as vanishing and exploding gradients. In this paper, a novel metaheuristic optimisation approach is proposed for training deep RNNs for the sentiment classification task. The approach employs an enhanced Ternary Bees Algorithm (BA-3+), which operates for large dataset classification problems by considering only three individual solutions in each iteration. BA-3+ combines the collaborative search of three bees to find the optimal set of trainable parameters of the proposed deep recurrent learning architecture. Local learning with exploitative search utilises the greedy selection strategy. Stochastic gradient descent (SGD) learning with singular value decomposition (SVD) aims to handle vanishing and exploding gradients of the decision parameters with the stabilisation strategy of SVD. Global learning with explorative search achieves faster convergence without getting trapped at local optima to find the optimal set of trainable parameters of the proposed deep recurrent learning architecture. BA-3+ has been tested on the sentiment classification task to classify symmetric and asymmetric distribution of the datasets from different domains, including Twitter, product reviews, and movie reviews. Comparative results have been obtained for advanced deep language models and Differential Evolution (DE) and Particle Swarm Optimization (PSO) algorithms. BA-3+ converged to the global minimum faster than the DE and PSO algorithms, and it outperformed the SGD, DE, and PSO algorithms for the Turkish and English datasets. The accuracy value and F1 measure have improved at least with a 30–40% improvement than the standard SGD algorithm for all classification datasets. Accuracy rates in the RNN model trained with BA-3+ ranged from 80% to 90%, while the RNN trained with SGD was able to achieve between 50% and 60% for most datasets. The performance of the RNN model with BA-3+ has as good as for Tree-LSTMs and Recursive Neural Tensor Networks (RNTNs) language models, which achieved accuracy results of up to 90% for some datasets. The improved accuracy and convergence results show that BA-3+ is an efficient, stable algorithm for the complex classification task, and it can handle the vanishing and exploding gradients problem of deep RNNs

    Towards Debugging and Testing Deep Learning Systems

    Get PDF
    Au cours des derniĂšres annĂ©es, l’apprentissage profond, en anglais Deep Learning (DL) a fait d’énormes progrĂšs, en atteignant et dĂ©passant mĂȘme parfois le niveau de performance des humains pour diffĂ©rentes tĂąches, telles que la classification des images et la reconnaissance vocale. GrĂące Ă  ces progrĂšs, nous constatons une large adoption du DL dans des applications critiques, telles que la conduite autonome de vĂ©hicules, la prĂ©vention et la dĂ©tection du crime, et le traitement mĂ©dical. Cependant, malgrĂ© leurs progrĂšs spectaculaires, les systĂšmes de DL, tout comme les logiciels traditionnels, prĂ©sentent souvent des comportements erronĂ©s en raison de l’existence de dĂ©fauts cachĂ©s ou d’inefficacitĂ©s. Ces comportements erronĂ©s peuvent ĂȘtre Ă  l’origine d’accidents catastrophiques. Ainsi, l’assurance de la qualitĂ© des logiciels (SQA), y compris la fiabilitĂ© et la robustesse, pour les systĂšmes de DL devient une prĂ©occupation majeure. Les tests traditionnels pour les modĂšles de DL consistent Ă  mesurer leurs performances sur des donnĂ©es collectĂ©es manuellement ; ils dĂ©pendent donc fortement de la qualitĂ© des donnĂ©es de test qui, souvent, n’incluent pas de donnĂ©es d’entrĂ©e rares, comme en tĂ©moignent les rĂ©cents accidents de voitures avec conduite autonome (exemple Tesla/Uber). Les techniques de test avancĂ©es sont trĂšs demandĂ©es pour amĂ©liorer la fiabilitĂ© des systĂšmes de DL. NĂ©anmoins, les tests des systĂšmes de DL posent des dĂ©fis importants, en raison de leur nature non-dĂ©terministe puisqu’ils suivent un paradigme axĂ© sur les donnĂ©es (la tĂąche cible est apprise statistiquement) et leur manque d’oracle puisqu’ils sont conçus principalement pour fournir la rĂ©ponse. RĂ©cemment, les chercheurs en gĂ©nie logiciel ont commencĂ© Ă  adapter des concepts du domaine du test logiciel tels que la couverture des cas de tests et les pseudo-oracles, pour rĂ©soudre ces difficultĂ©s. MalgrĂ© les rĂ©sultats prometteurs obtenus de cette rĂ©novation des mĂ©thodes existantes de test logiciel, le domaine du test des systĂšmes de DL est encore immature et les mĂ©thodes proposĂ©es Ă  ce jour ne sont pas trĂšs efficaces. Dans ce mĂ©moire, nous examinons les solutions existantes proposĂ©es pour tester les systĂšmes de DL et proposons quelques nouvelles techniques. Nous rĂ©alisons cet objectif en suivant une approche systĂ©matique qui consiste Ă  : (1) Ă©tudier les problĂšmes et les dĂ©fis liĂ©s aux tests des logiciels de DL; (2) souligner les forces et les faiblesses des techniques de test logiciel adaptĂ©es aux systĂšmes de DL; (3) proposer de nouvelles solutions de test pour combler certaines lacunes identifiĂ©es dans la littĂ©rature, et potentiellement aider Ă  amĂ©liorer l’assurance qualitĂ© des systĂšmes de DL.----------ABSTRACT: Over the past few years, Deep Learning (DL) has made tremendous progress, achieving or surpassing human-level performance for different tasks such as image classification and speech recognition. Thanks to these advances, we are witnessing a wide adoption of DL in safetycritical applications such as autonomous driving cars, crime prevention and detection, and medical treatment. However, despite their spectacular progress, DL systems, just like traditional software systems, often exhibit erroneous corner-cases behaviors due to the existence of latent defects or inefficiencies, and which can lead to catastrophic accidents. Thus, software quality assurance (SQA), including reliability and robustness, for DL systems becomes a big concern. Traditional testing for DL models consists of measuring their performance on manually collected data ; so it heavily depends on the quality of the test data that often fails to include rare inputs, as evidenced by recent autonomous-driving car accidents (e.g., Tesla/Uber). Advanced testing techniques are in high demand to improve the trustworthiness of DL systems. Nevertheless, DL testing poses significant challenges stemming from the non-deterministic nature of DL systems (since they follow a data-driven paradigm ; the target task is learned statistically) and their lack of oracle (since they are designed principally to provide the answer). Recently, software researchers have started adapting concepts from the software testing domain such as test coverage and pseudo-oracles to tackle these difficulties. Despite some promising results obtained from adapting existing software testing methods, current software testing techniques for DL systems are still quite immature. In this thesis, we examine existing testing techniques for DL systems and propose some new techniques. We achieve this by following a systematic approach consisting of : (1) investigating DL software issues and testing challenges ; (2) outlining the strengths and weaknesses of the software-based testing techniques adapted for DL systems ; and (3) proposing novel testing solutions to fill some of the identified literature gaps, and potentially help improving the SQA of DL systems

    CUDA-bigPSF: An optimized version of bigPSF accelerated with Graphics Processing Unit

    Get PDF
    Accurate and fast short-term load forecasting is crucial in efficiently managing energy production and distribution. As such, many different algorithms have been proposed to address this topic, including hybrid models that combine clustering with other forecasting techniques. One of these algorithms is bigPSF, an algorithm that combines K-means clustering and a similarity search optimized for its use in distributed environments. The work presented in this paper aims to improve the time required to execute the algorithm with two main contributions. First, some of the issues of the original proposal that limited the number of cores simultaneously used are studied and highlighted. Second, a version of the algorithm optimized for Graphics Processing Unit (GPU) is proposed, solving the previously mentioned issues while taking into account the GPU architecture and memory structure. Experimentation was done with seven years of real-world electric demand data from Uruguay. Results show that the proposed algorithm executed consistently faster than the original version, achieving speedups up to 500 times faster during the training phase.Funding for open access charge: Universidad de Granada / CBUAGrant PID2020-112495RB-C21 funded by MCIN/ AEI /10.13039/501100011033I + D + i FEDER 2020 project B-TIC-42-UGR2

    Enhanced Deep Network Designs Using Mitochondrial DNA Based Genetic Algorithm And Importance Sampling

    Get PDF
    Machine learning (ML) is playing an increasingly important role in our lives. It has already made huge impact in areas such as cancer diagnosis, precision medicine, self-driving cars, natural disasters predictions, speech recognition, etc. The painstakingly handcrafted feature extractors used in the traditional learning, classification and pattern recognition systems are not scalable for large-sized datasets or adaptable to different classes of problems or domains. Machine learning resurgence in the form of Deep Learning (DL) in the last decade after multiple AI (artificial intelligence) winters and hype cycles is a result of the convergence of advancements in training algorithms, availability of massive data (big data) and innovation in compute resources (GPUs and cloud). If we want to solve more complex problems with machine learning, we need to optimize all three of these areas, i.e., algorithms, dataset and compute. Our dissertation research work presents the original application of nature-inspired idea of mitochondrial DNA (mtDNA) to improve deep learning network design. Additional fine-tuning is provided with Monte Carlo based method called importance sampling (IS). The primary performance indicators for machine learning are model accuracy, loss and training time. The goal of our dissertation is to provide a framework to address all these areas by optimizing network designs (in the form of hyperparameter optimization) and dataset using enhanced Genetic Algorithm (GA) and importance sampling. Algorithms are by far the most important aspect of machine learning. We demonstrate the application of mitochondrial DNA to complement the standard genetic algorithm for architecture optimization of deep Convolution Neural Network (CNN). We use importance sampling to reduce the dataset variance and sample more often from the instances that add greater value from the training outcome perspective. And finally, we leverage massive parallel and distributed processing of GPUs in the cloud to speed up training. Thus, our multi-approach method for enhancing deep learning combines architecture optimization, dataset optimization and the power of the cloud to drive better model accuracy and reduce training time
    • 

    corecore