17 research outputs found

    Monte-Carlo tree search enhancements for one-player and two-player domains

    Get PDF

    Structured machine learning models for robustness against different factors of variability in robot control

    Get PDF
    An important feature of human sensorimotor skill is our ability to learn to reuse them across different environmental contexts, in part due to our understanding of attributes of variability in these environments. This thesis explores how the structure of models used within learning for robot control could similarly help autonomous robots cope with variability, hence achieving skill generalisation. The overarching approach is to develop modular architectures that judiciously combine different forms of inductive bias for learning. In particular, we consider how models and policies should be structured in order to achieve robust behaviour in the face of different factors of variation - in the environment, in objects and in other internal parameters of a policy - with the end goal of more robust, accurate and data-efficient skill acquisition and adaptation. At a high level, variability in skill is determined by variations in constraints presented by the external environment, and in task-specific perturbations that affect the specification of optimal action. A typical example of environmental perturbation would be variation in lighting and illumination, affecting the noise characteristics of perception. An example of task perturbations would be variation in object geometry, mass or friction, and in the specification of costs associated with speed or smoothness of execution. We counteract these factors of variation by exploring three forms of structuring: utilising separate data sets curated according to the relevant factor of variation, building neural network models that incorporate this factorisation into the very structure of the networks, and learning structured loss functions. The thesis is comprised of four projects exploring this theme within robotics planning and prediction tasks. Firstly, in the setting of trajectory prediction in crowded scenes, we explore a modular architecture for learning static and dynamic environmental structure. We show that factorising the prediction problem from the individual representations allows for robust and label efficient forward modelling, and relaxes the need for full model re-training in new environments. This modularity explicitly allows for a more flexible and interpretable adaptation of trajectory prediction models to using pre-trained state of the art models. We show that this results in more efficient motion prediction and allows for performance comparable to the state-of-the-art supervised 2D trajectory prediction. Next, in the domain of contact-rich robotic manipulation, we consider a modular architecture that combines model-free learning from demonstration, in particular dynamic movement primitives (DMP), with modern model-free reinforcement learning (RL), using both on-policy and off-policy approaches. We show that factorising the skill learning problem to skill acquisition and error correction through policy adaptation strategies such as residual learning can help improve the overall performance of policies in the context of contact-rich manipulation. Our empirical evaluation demonstrates how to best do this with DMPs and propose “residual Learning from Demonstration“ (rLfD), a framework that combines DMPs with RL to learn a residual correction policy. Our evaluations, performed both in simulation and on a physical system, suggest that applying residual learning directly in task space and operating on the full pose of the robot can significantly improve the overall performance of DMPs. We show that rLfD offers a gentle to the joints solution that improves the task success and generalisation of DMPs. Last but not least, our study shows that the extracted correction policies can be transferred to different geometries and frictions through few-shot task adaptation. Third, we employ meta learning to learn time-invariant reward functions, wherein both the objectives of a task (i.e., the reward functions) and the policy for performing that task optimally are learnt simultaneously. We propose a novel inverse reinforcement learning (IRL) formulation that allows us to 1) vary the length of execution by learning time-invariant costs, and 2) relax the temporal alignment requirements for learning from demonstration. We apply our method to two different types of cost formulations and evaluate their performance in the context of learning reward functions for simulated placement and peg in hole tasks executed on a 7DoF Kuka IIWA arm. Our results show that our approach enables learning temporally invariant rewards from misaligned demonstration that can also generalise spatially to out of distribution tasks. Finally, we employ our observations to evaluate adversarial robustness in the context of transfer learning from a source trained on CIFAR 100 to a target network trained on CIFAR 10. Specifically, we study the effects of using robust optimisation in the source and target networks. This allows us to identify transfer learning strategies under which adversarial defences are successfully retained, in addition to revealing potential vulnerabilities. We study the extent to which adversarially robust features can preserve their defence properties against black and white-box attacks under three different transfer learning strategies. Our empirical evaluations give insights on how well adversarial robustness under transfer learning can generalise.

    An evaluation of the challenges of Multilingualism in Data Warehouse development

    Get PDF
    In this paper we discuss Business Intelligence and define what is meant by support for Multilingualism in a Business Intelligence reporting context. We identify support for Multilingualism as a challenging issue which has implications for data warehouse design and reporting performance. Data warehouses are a core component of most Business Intelligence systems and the star schema is the approach most widely used to develop data warehouses and dimensional Data Marts. We discuss the way in which Multilingualism can be supported in the Star Schema and identify that current approaches have serious limitations which include data redundancy and data manipulation, performance and maintenance issues. We propose a new approach to enable the optimal application of multilingualism in Business Intelligence. The proposed approach was found to produce satisfactory results when used in a proof-of-concept environment. Future work will include testing the approach in an enterprise environmen

    Integrated machine learning and optimization approaches

    Get PDF
    This dissertation focuses on the integration of machine learning and optimization. Specifically, novel machine learning-based frameworks are proposed to help solve a broad range of well-known operations research problems to reduce the solution times. The first study presents a bidirectional Long Short-Term Memory framework to learn optimal solutions to sequential decision-making problems. Computational results show that the framework significantly reduces the solution time of benchmark capacitated lot-sizing problems without much loss in feasibility and optimality. Also, models trained using shorter planning horizons can successfully predict the optimal solution of the instances with longer planning horizons. For the hardest data set, the predictions at the 25% level reduce the solution time of 70 CPU hours to less than 2 CPU minutes with an optimality gap of 0.8% and without infeasibility. In the second study, an extendable prediction-optimization framework is presented for multi-stage decision-making problems to address the key issues of sequential dependence, infeasibility, and generalization. Specifically, an attention-based encoder-decoder neural network architecture is integrated with an infeasibility-elimination and generalization framework to learn high-quality feasible solutions. The proposed framework is demonstrated to tackle the two well-known dynamic NP-Hard optimization problems: multi-item capacitated lot-sizing and multi-dimensional knapsack. The results show that models trained on shorter and smaller-dimension instances can be successfully used to predict longer and larger-dimension problems with the presented item-wise expansion algorithm. The solution time can be reduced by three orders of magnitude with an average optimality gap below 0.1%. The proposed framework can be advantageous for solving dynamic mixed-integer programming problems that need to be solved instantly and repetitively. In the third study, a deep reinforcement learning-based framework is presented for solving scenario-based two-stage stochastic programming problems, which are computationally challenging to solve. A general two-stage deep reinforcement learning framework is proposed where two learning agents sequentially learn to solve each stage of a general two-stage stochastic multi-dimensional knapsack problem. The results show that solution time can be reduced significantly with a relatively small gap. Additionally, decision-making agents can be trained with a few scenarios and solve problems with a large number of scenarios. In the fourth study, a learning-based prediction-optimization framework is proposed for solving scenario-based multi-stage stochastic programs. The issue of non-anticipativity is addressed with a novel neural network architecture that is based on a neural machine translation system. Furthermore, training the models on deterministic problems is suggested instead of solving hard and time-consuming stochastic programs. In this framework, the level of variables used for the solution is iteratively reduced to eliminate infeasibility, and a heuristic based on a linear relaxation is performed to reduce the solution time. An improved item-wise expansion strategy is introduced to generalize the algorithm to tackle instances with different sizes. The results are presented in solving stochastic multi-item capacitated lot-sizing and stochastic multi-stage multi-dimensional knapsack problems. The results show that the solution time can be reduced by a factor of 599 with an optimality gap of only 0.08%. Moreover, results demonstrate that the models can be used to predict similarly structured stochastic programming problems with a varying number of periods, items, and scenarios. The frameworks presented in this dissertation can be utilized to achieve high-quality and fast solutions to repeatedly-solved problems in various industrial and business settings, such as production and inventory management, capacity planning, scheduling, airline logistics, dynamic pricing, and emergency management

    Foundations of Human-Aware Planning -- A Tale of Three Models

    Get PDF
    abstract: A critical challenge in the design of AI systems that operate with humans in the loop is to be able to model the intentions and capabilities of the humans, as well as their beliefs and expectations of the AI system itself. This allows the AI system to be "human- aware" -- i.e. the human task model enables it to envisage desired roles of the human in joint action, while the human mental model allows it to anticipate how its own actions are perceived from the point of view of the human. In my research, I explore how these concepts of human-awareness manifest themselves in the scope of planning or sequential decision making with humans in the loop. To this end, I will show (1) how the AI agent can leverage the human task model to generate symbiotic behavior; and (2) how the introduction of the human mental model in the deliberative process of the AI agent allows it to generate explanations for a plan or resort to explicable plans when explanations are not desired. The latter is in addition to traditional notions of human-aware planning which typically use the human task model alone and thus enables a new suite of capabilities of a human-aware AI agent. Finally, I will explore how the AI agent can leverage emerging mixed-reality interfaces to realize effective channels of communication with the human in the loop.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Modeling and Simulating Causal Dependencies on Process-aware Information Systems from a Cost Perspective

    Get PDF
    Providing effective IT support for business processes has become crucial for enterprises to stay competitive in their market. Business processes must be defined, implemented, enacted, monitored, and continuously adapted to changing situations. Process life cycle support and continuous process improvement become critical success factors in contemporary and future enterprise computing. In this context, process-aware information systems (PAISs) adopt a key role. Thereby, organization-specific and generic process support systems are distinguished. In the former case, the PAIS is build "from scratch" and incorporates organization-specific information about the structure and processes to be supported. In the latter case, the PAIS does not contain any information about the structure and processes of a particular organization. Instead, an organization needs to configure the PAIS by specifying processes, organizational entities, and business objects. To enable the realization of PAISs, numerous process support paradigms, process modeling standards, and business process management tools have been introduced. The application of these approaches in PAIS engineering projects is not only influenced by technological, but also by organizational and project-specific factors. Between these factors there exist numerous causal dependencies, which, in turn, often lead to complex and unexpected effects in PAIS engineering projects. In particular, the costs of PAIS engineering projects are significantly influenced by these causal dependencies. What is therefore needed is a comprehensive approach enabling PAIS engineers to systematically investigate these causal dependencies as well as their impact on the costs of PAIS engineering projects. Existing economic-driven IT evaluation and software cost estimation approaches, however, are unable to take into account causal dependencies and resulting effects. In response, this thesis introduces the EcoPOST framework. This framework utilizes evaluation models to describe the interplay of technological, organizational, and project-specific evaluation factors, and simulation concepts to unfold the dynamic behavior of PAIS engineering projects. In this context, the EcoPOST framework also supports the reuse of evaluation models based on a library of generic, predefined evaluation patterns and also provides governing guidelines (e.g., model design guidelines) which enhance the transfer of the EcoPOST framework into practice. Tool support is available as well. Finally, we present the results of two online surveys, three case studies, and one controlled software experiment. Based on these empirical and experimental research activities, we are able to validate evaluation concepts underlying the EcoPOST framework and additionally demonstrate its practical applicability
    corecore