669 research outputs found

    Meta-learning algorithms and applications

    Get PDF
    Meta-learning in the broader context concerns how an agent learns about their own learning, allowing them to improve their learning process. Learning how to learn is not only beneficial for humans, but it has also shown vast benefits for improving how machines learn. In the context of machine learning, meta-learning enables models to improve their learning process by selecting suitable meta-parameters that influence the learning. For deep learning specifically, the meta-parameters typically describe details of the training of the model but can also include description of the model itself - the architecture. Meta-learning is usually done with specific goals in mind, for example trying to improve ability to generalize or learn new concepts from only a few examples. Meta-learning can be powerful, but it comes with a key downside: it is often computationally costly. If the costs would be alleviated, meta-learning could be more accessible to developers of new artificial intelligence models, allowing them to achieve greater goals or save resources. As a result, one key focus of our research is on significantly improving the efficiency of meta-learning. We develop two approaches: EvoGrad and PASHA, both of which significantly improve meta-learning efficiency in two common scenarios. EvoGrad allows us to efficiently optimize the value of a large number of differentiable meta-parameters, while PASHA enables us to efficiently optimize any type of meta-parameters but fewer in number. Meta-learning is a tool that can be applied to solve various problems. Most commonly it is applied for learning new concepts from only a small number of examples (few-shot learning), but other applications exist too. To showcase the practical impact that meta-learning can make in the context of neural networks, we use meta-learning as a novel solution for two selected problems: more accurate uncertainty quantification (calibration) and general-purpose few-shot learning. Both are practically important problems and using meta-learning approaches we can obtain better solutions than the ones obtained using existing approaches. Calibration is important for safety-critical applications of neural networks, while general-purpose few-shot learning tests model's ability to generalize few-shot learning abilities across diverse tasks such as recognition, segmentation and keypoint estimation. More efficient algorithms as well as novel applications enable the field of meta-learning to make more significant impact on the broader area of deep learning and potentially solve problems that were too challenging before. Ultimately both of them allow us to better utilize the opportunities that artificial intelligence presents

    Application of Saliency Maps for Optimizing Camera Positioning in Deep Learning Applications

    Get PDF
    In the fields of process control engineering and robotics, especially in automatic control, optimization challenges frequently manifest as complex problems with expensive evaluations. This thesis zeroes in on one such problem: the optimization of camera positions for Convolutional Neural Networks (CNNs). CNNs have specific attention points in images that are often not intuitive to human perception, making camera placement critical for performance. The research is guided by two primary questions. The first investigates the role of Explainable Artificial Intelligence (XAI), specifically GradCAM++ visual explanations, in Computer Vision for aiding in the evaluation of different camera positions. Building on this, the second question assesses a novel algorithm that leverages these XAI features against traditional black-box optimization methods. To answer these questions, the study employs a robotic auto-positioning system for data collection, CNN model training, and performance evaluation. A case study focused on classifying flow regimes in industrial-grade bioreactors validates the method. The proposed approach shows improvements over established techniques like Grid Search, Random Search, Bayesian optimization, and Simulated Annealing. Future work will focus on gathering more data and including noise for generalized conclusions.:Contents 1 Introduction 1.1 Motivation 1.2 Problem Analysis 1.3 Research Question 1.4 Structure of the Thesis 2 State of the Art 2.1 Literature Research Methodology 2.1.1 Search Strategy 2.1.2 Inclusion and Exclusion Criteria 2.2 Blackbox Optimization 2.3 Mathematical Notation 2.4 Bayesian Optimization 2.5 Simulated Annealing 2.6 Random Search 2.7 Gridsearch 2.8 Explainable A.I. and Saliency Maps 2.9 Flowregime Classification in Stirred Vessels 2.10 Performance Metrics 2.10.1 R2 Score and Polynomial Regression for Experiment Data Analysis 2.10.2 Blackbox Optimization Performance Metrics 2.10.3 CNN Performance Metrics 3 Methodology 3.1 Requirement Analysis and Research Hypothesis 3.2 Research Approach: Case Study 3.3 Data Collection 3.4 Evaluation and Justification 4 Concept 4.1 System Overview 4.2 Data Flow 4.3 Experimental Setup 4.4 Optimization Challenges and Approaches 5 Data Collection and Experimental Setup 5.1 Hardware Components 5.2 Data Recording and Design of Experiments 5.3 Data Collection 5.4 Post-Experiment 6 Implementation 6.1 Simulation Unit 6.2 Recommendation Scalar from Saliency Maps 6.3 Saliency Map Features as Guidance Mechanism 6.4 GradCam++ Enhanced Bayesian Optimization 6.5 Benchmarking Unit 6.6 Benchmarking 7 Results and Evaluation 7.1 Experiment Data Analysis 7.2 Recommendation Scalar 7.3 Benchmarking Results and Quantitative Analysis 7.3.1 Accuracy Results from the Benchmarking Process 7.3.2 Cumulative Results Interpretation 7.3.3 Analysis of Variability 7.4 Answering the Research Questions 7.5 Summary 8 Discussion 8.1 Critical Examination of Limitations 8.2 Discussion of Solutions to Limitations 8.3 Practice-Oriented Discussion of Findings 9 Summary and OutlookIm Bereich der Prozessleittechnik und Robotik, speziell bei der automatischen Steuerung, treten oft komplexe Optimierungsprobleme auf. Diese Arbeit konzentriert sich auf die Optimierung der Kameraplatzierung in Anwendungen, die Convolutional Neural Networks (CNNs) verwenden. Da CNNs spezifische, für den Menschen nicht immer ersichtliche, Merkmale in Bildern hervorheben, ist die intuitive Platzierung der Kamera oft nicht optimal. Zwei Forschungsfragen leiten diese Arbeit: Die erste Frage untersucht die Rolle von Erklärbarer Künstlicher Intelligenz (XAI) in der Computer Vision zur Bereitstellung von Merkmalen für die Bewertung von Kamerapositionen. Die zweite Frage vergleicht einen darauf basierenden Algorithmus mit anderen Blackbox-Optimierungstechniken. Ein robotisches Auto-Positionierungssystem wird zur Datenerfassung und für Experimente eingesetzt. Als Lösungsansatz wird eine Methode vorgestellt, die XAI-Merkmale, insbesondere solche aus GradCAM++ Erkenntnissen, mit einem Bayesschen Optimierungsalgorithmus kombiniert. Diese Methode wird in einer Fallstudie zur Klassifizierung von Strömungsregimen in industriellen Bioreaktoren angewendet und zeigt eine gesteigerte performance im Vergleich zu etablierten Methoden. Zukünftige Forschung wird sich auf die Sammlung weiterer Daten, die Inklusion von verrauschten Daten und die Konsultation von Experten für eine kostengünstigere Implementierung konzentrieren.:Contents 1 Introduction 1.1 Motivation 1.2 Problem Analysis 1.3 Research Question 1.4 Structure of the Thesis 2 State of the Art 2.1 Literature Research Methodology 2.1.1 Search Strategy 2.1.2 Inclusion and Exclusion Criteria 2.2 Blackbox Optimization 2.3 Mathematical Notation 2.4 Bayesian Optimization 2.5 Simulated Annealing 2.6 Random Search 2.7 Gridsearch 2.8 Explainable A.I. and Saliency Maps 2.9 Flowregime Classification in Stirred Vessels 2.10 Performance Metrics 2.10.1 R2 Score and Polynomial Regression for Experiment Data Analysis 2.10.2 Blackbox Optimization Performance Metrics 2.10.3 CNN Performance Metrics 3 Methodology 3.1 Requirement Analysis and Research Hypothesis 3.2 Research Approach: Case Study 3.3 Data Collection 3.4 Evaluation and Justification 4 Concept 4.1 System Overview 4.2 Data Flow 4.3 Experimental Setup 4.4 Optimization Challenges and Approaches 5 Data Collection and Experimental Setup 5.1 Hardware Components 5.2 Data Recording and Design of Experiments 5.3 Data Collection 5.4 Post-Experiment 6 Implementation 6.1 Simulation Unit 6.2 Recommendation Scalar from Saliency Maps 6.3 Saliency Map Features as Guidance Mechanism 6.4 GradCam++ Enhanced Bayesian Optimization 6.5 Benchmarking Unit 6.6 Benchmarking 7 Results and Evaluation 7.1 Experiment Data Analysis 7.2 Recommendation Scalar 7.3 Benchmarking Results and Quantitative Analysis 7.3.1 Accuracy Results from the Benchmarking Process 7.3.2 Cumulative Results Interpretation 7.3.3 Analysis of Variability 7.4 Answering the Research Questions 7.5 Summary 8 Discussion 8.1 Critical Examination of Limitations 8.2 Discussion of Solutions to Limitations 8.3 Practice-Oriented Discussion of Findings 9 Summary and Outloo

    SmartChoices: Augmenting Software with Learned Implementations

    Full text link
    We are living in a golden age of machine learning. Powerful models are being trained to perform many tasks far better than is possible using traditional software engineering approaches alone. However, developing and deploying those models in existing software systems remains difficult. In this paper we present SmartChoices, a novel approach to incorporating machine learning into mature software stacks easily, safely, and effectively. We explain the overall design philosophy and present case studies using SmartChoices within large scale industrial systems

    Academic writing for IT students

    Get PDF
    This textbook is intended for Master and PhD Information Technology students (B1-C1 level of English proficiency). The instructions of how to write a research paper in English and the relevant exercises are given. The peculiarities of each section of a paper are presented. The exercises are based on real science materials taken from peer-reviewed journals. The subject area covers a wide scope of different Information Technology domains

    Security and Authenticity of AI-generated code

    Get PDF
    The intersection of security and plagiarism in the context of AI-generated code is a critical theme through- out this study. While our research primarily focuses on evaluating the security aspects of AI-generated code, it is imperative to recognize the interconnectedness of security and plagiarism concerns. On the one hand, we do an extensive analysis of the security flaws that might be present in AI-generated code, with a focus on code produced by ChatGPT and Bard. This analysis emphasizes the dangers that might occur if such code is incorporated into software programs, especially if it has security weaknesses. This directly affects developers, advising them to use caution when thinking about integrating AI-generated code to protect the security of their applications. On the other hand, our research also covers code plagiarism. In the context of AI-generated code, plagiarism, which is defined as the reuse of code without proper attribution or in violation of license and copyright restrictions, becomes a significant concern. As open-source software and AI language models proliferate, the risk of plagiarism in AI-generated code increases. Our research combines code attribution techniques to identify the authors of AI-generated insecure code and identify where the code originated. Our research emphasizes the multidimensional nature of AI-generated code and its wide-ranging repercussions by addressing both security and plagiarism issues at the same time. This complete approach adds to a more profound understanding of the problems and ethical implications associated with the use of AI in code generation, embracing both security and authorship-related concerns

    Modular lifelong machine learning

    Get PDF
    Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge. Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand. This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems. First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures. Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations. Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods. Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer

    Graph-Structured Kernel Design for Power Flow Learning using Gaussian Processes

    Full text link
    This paper presents a physics-inspired graph-structured kernel designed for power flow learning using Gaussian Process (GP). The kernel, named the vertex-degree kernel (VDK), relies on latent decomposition of voltage-injection relationship based on the network graph or topology. Notably, VDK design avoids the need to solve optimization problems for kernel search. To enhance efficiency, we also explore a graph-reduction approach to obtain a VDK representation with lesser terms. Additionally, we propose a novel network-swipe active learning scheme, which intelligently selects sequential training inputs to accelerate the learning of VDK. Leveraging the additive structure of VDK, the active learning algorithm performs a block-descent type procedure on GP's predictive variance, serving as a proxy for information gain. Simulations demonstrate that the proposed VDK-GP achieves more than two fold sample complexity reduction, compared to full GP on medium scale 500-Bus and large scale 1354-Bus power systems. The network-swipe algorithm outperforms mean performance of 500 random trials on test predictions by two fold for medium-sized 500-Bus systems and best performance of 25 random trials for large-scale 1354-Bus systems by 10%. Moreover, we demonstrate that the proposed method's performance for uncertainty quantification applications with distributionally shifted testing data sets.Comment: 10 page

    Solving Continuous Control via Q-learning

    Full text link
    While there has been substantial success for solving continuous control with actor-critic methods, simpler critic-only methods such as Q-learning find limited application in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilisation, compute requirements and wider hyperparameter search spaces. We show that a simple modification of deep Q-learning largely alleviates these issues. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods when learning from features or pixels. We extend classical bandit examples from cooperative MARL to provide intuition for how decoupled critics leverage state information to coordinate joint optimization, and demonstrate surprisingly strong performance across a variety of continuous control tasks

    Machine Learning Meets Advanced Robotic Manipulation

    Full text link
    Automated industries lead to high quality production, lower manufacturing cost and better utilization of human resources. Robotic manipulator arms have major role in the automation process. However, for complex manipulation tasks, hard coding efficient and safe trajectories is challenging and time consuming. Machine learning methods have the potential to learn such controllers based on expert demonstrations. Despite promising advances, better approaches must be developed to improve safety, reliability, and efficiency of ML methods in both training and deployment phases. This survey aims to review cutting edge technologies and recent trends on ML methods applied to real-world manipulation tasks. After reviewing the related background on ML, the rest of the paper is devoted to ML applications in different domains such as industry, healthcare, agriculture, space, military, and search and rescue. The paper is closed with important research directions for future works
    corecore