4,770 research outputs found

    Volumetric Techniques for Product Routing and Loading Optimisation in Industry 4.0: A Review

    Get PDF
    Industry 4.0 has become a crucial part in the majority of processes, components, and related modelling, as well as predictive tools that allow a more efficient, automated and sustainable approach to industry. The availability of large quantities of data, and the advances in IoT, AI, and data-driven frameworks, have led to an enhanced data gathering, assessment, and extraction of actionable information, resulting in a better decision-making process. Product picking and its subsequent packing is an important area, and has drawn increasing attention for the research community. However, depending of the context, some of the related approaches tend to be either highly mathematical, or applied to a specific context. This article aims to provide a survey on the main methods, techniques, and frameworks relevant to product packing and to highlight the main properties and features that should be further investigated to ensure a more efficient and optimised approach

    Computational techniques to interpret the neural code underlying complex cognitive processes

    Get PDF
    Advances in large-scale neural recording technology have significantly improved the capacity to further elucidate the neural code underlying complex cognitive processes. This thesis aimed to investigate two research questions in rodent models. First, what is the role of the hippocampus in memory and specifically what is the underlying neural code that contributes to spatial memory and navigational decision-making. Second, how is social cognition represented in the medial prefrontal cortex at the level of individual neurons. To start, the thesis begins by investigating memory and social cognition in the context of healthy and diseased states that use non-invasive methods (i.e. fMRI and animal behavioural studies). The main body of the thesis then shifts to developing our fundamental understanding of the neural mechanisms underpinning these cognitive processes by applying computational techniques to ana lyse stable large-scale neural recordings. To achieve this, tailored calcium imaging and behaviour preprocessing computational pipelines were developed and optimised for use in social interaction and spatial navigation experimental analysis. In parallel, a review was conducted on methods for multivariate/neural population analysis. A comparison of multiple neural manifold learning (NML) algorithms identified that non linear algorithms such as UMAP are more adaptable across datasets of varying noise and behavioural complexity. Furthermore, the review visualises how NML can be applied to disease states in the brain and introduces the secondary analyses that can be used to enhance or characterise a neural manifold. Lastly, the preprocessing and analytical pipelines were combined to investigate the neural mechanisms in volved in social cognition and spatial memory. The social cognition study explored how neural firing in the medial Prefrontal cortex changed as a function of the social dominance paradigm, the "Tube Test". The univariate analysis identified an ensemble of behavioural-tuned neurons that fire preferentially during specific behaviours such as "pushing" or "retreating" for the animal’s own behaviour and/or the competitor’s behaviour. Furthermore, in dominant animals, the neural population exhibited greater average firing than that of subordinate animals. Next, to investigate spatial memory, a spatial recency task was used, where rats learnt to navigate towards one of three reward locations and then recall the rewarded location of the session. During the task, over 1000 neurons were recorded from the hippocampal CA1 region for five rats over multiple sessions. Multivariate analysis revealed that the sequence of neurons encoding an animal’s spatial position leading up to a rewarded location was also active in the decision period before the animal navigates to the rewarded location. The result posits that prospective replay of neural sequences in the hippocampal CA1 region could provide a mechanism by which decision-making is supported

    Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback

    Get PDF
    We propose a method to capture the handling abilities of fast jet pilots in a software model via reinforcement learning (RL) from human preference feedback. We use pairwise preferences over simulated flight trajectories to learn an interpretable rule-based model called a reward tree, which enables the automated scoring of trajectories alongside an explanatory rationale. We train an RL agent to execute high-quality handling behaviour by using the reward tree as the objective, and thereby generate data for iterative preference collection and further refinement of both tree and agent. Experiments with synthetic preferences show reward trees to be competitive with uninterpretable neural network reward models on quantitative and qualitative evaluations

    Advances in machine learning algorithms for financial risk management

    Get PDF
    In this thesis, three novel machine learning techniques are introduced to address distinct yet interrelated challenges involved in financial risk management tasks. These approaches collectively offer a comprehensive strategy, beginning with the precise classification of credit risks, advancing through the nuanced forecasting of financial asset volatility, and ending with the strategic optimisation of financial asset portfolios. Firstly, a Hybrid Dual-Resampling and Cost-Sensitive technique has been proposed to combat the prevalent issue of class imbalance in financial datasets, particularly in credit risk assessment. The key process involves the creation of heuristically balanced datasets to effectively address the problem. It uses a resampling technique based on Gaussian mixture modelling to generate a synthetic minority class from the minority class data and concurrently uses k-means clustering on the majority class. Feature selection is then performed using the Extra Tree Ensemble technique. Subsequently, a cost-sensitive logistic regression model is then applied to predict the probability of default using the heuristically balanced datasets. The results underscore the effectiveness of our proposed technique, with superior performance observed in comparison to other imbalanced preprocessing approaches. This advancement in credit risk classification lays a solid foundation for understanding individual financial behaviours, a crucial first step in the broader context of financial risk management. Building on this foundation, the thesis then explores the forecasting of financial asset volatility, a critical aspect of understanding market dynamics. A novel model that combines a Triple Discriminator Generative Adversarial Network with a continuous wavelet transform is proposed. The proposed model has the ability to decompose volatility time series into signal-like and noise-like frequency components, to allow the separate detection and monitoring of non-stationary volatility data. The network comprises of a wavelet transform component consisting of continuous wavelet transforms and inverse wavelet transform components, an auto-encoder component made up of encoder and decoder networks, and a Generative Adversarial Network consisting of triple Discriminator and Generator networks. The proposed Generative Adversarial Network employs an ensemble of unsupervised loss derived from the Generative Adversarial Network component during training, supervised loss and reconstruction loss as part of its framework. Data from nine financial assets are employed to demonstrate the effectiveness of the proposed model. This approach not only enhances our understanding of market fluctuations but also bridges the gap between individual credit risk assessment and macro-level market analysis. Finally the thesis ends with a novel proposal of a novel technique or Portfolio optimisation. This involves the use of a model-free reinforcement learning strategy for portfolio optimisation using historical Low, High, and Close prices of assets as input with weights of assets as output. A deep Capsules Network is employed to simulate the investment strategy, which involves the reallocation of the different assets to maximise the expected return on investment based on deep reinforcement learning. To provide more learning stability in an online training process, a Markov Differential Sharpe Ratio reward function has been proposed as the reinforcement learning objective function. Additionally, a Multi-Memory Weight Reservoir has also been introduced to facilitate the learning process and optimisation of computed asset weights, helping to sequentially re-balance the portfolio throughout a specified trading period. The use of the insights gained from volatility forecasting into this strategy shows the interconnected nature of the financial markets. Comparative experiments with other models demonstrated that our proposed technique is capable of achieving superior results based on risk-adjusted reward performance measures. In a nut-shell, this thesis not only addresses individual challenges in financial risk management but it also incorporates them into a comprehensive framework; from enhancing the accuracy of credit risk classification, through the improvement and understanding of market volatility, to optimisation of investment strategies. These methodologies collectively show the potential of the use of machine learning to improve financial risk management

    A Trust Management Framework for Vehicular Ad Hoc Networks

    Get PDF
    The inception of Vehicular Ad Hoc Networks (VANETs) provides an opportunity for road users and public infrastructure to share information that improves the operation of roads and the driver experience. However, such systems can be vulnerable to malicious external entities and legitimate users. Trust management is used to address attacks from legitimate users in accordance with a user’s trust score. Trust models evaluate messages to assign rewards or punishments. This can be used to influence a driver’s future behaviour or, in extremis, block the driver. With receiver-side schemes, various methods are used to evaluate trust including, reputation computation, neighbour recommendations, and storing historical information. However, they incur overhead and add a delay when deciding whether to accept or reject messages. In this thesis, we propose a novel Tamper-Proof Device (TPD) based trust framework for managing trust of multiple drivers at the sender side vehicle that updates trust, stores, and protects information from malicious tampering. The TPD also regulates, rewards, and punishes each specific driver, as required. Furthermore, the trust score determines the classes of message that a driver can access. Dissemination of feedback is only required when there is an attack (conflicting information). A Road-Side Unit (RSU) rules on a dispute, using either the sum of products of trust and feedback or official vehicle data if available. These “untrue attacks” are resolved by an RSU using collaboration, and then providing a fixed amount of reward and punishment, as appropriate. Repeated attacks are addressed by incremental punishments and potentially driver access-blocking when conditions are met. The lack of sophistication in this fixed RSU assessment scheme is then addressed by a novel fuzzy logic-based RSU approach. This determines a fairer level of reward and punishment based on the severity of incident, driver past behaviour, and RSU confidence. The fuzzy RSU controller assesses judgements in such a way as to encourage drivers to improve their behaviour. Although any driver can lie in any situation, we believe that trustworthy drivers are more likely to remain so, and vice versa. We capture this behaviour in a Markov chain model for the sender and reporter driver behaviours where a driver’s truthfulness is influenced by their trust score and trust state. For each trust state, the driver’s likelihood of lying or honesty is set by a probability distribution which is different for each state. This framework is analysed in Veins using various classes of vehicles under different traffic conditions. Results confirm that the framework operates effectively in the presence of untrue and inconsistent attacks. The correct functioning is confirmed with the system appropriately classifying incidents when clarifier vehicles send truthful feedback. The framework is also evaluated against a centralized reputation scheme and the results demonstrate that it outperforms the reputation approach in terms of reduced communication overhead and shorter response time. Next, we perform a set of experiments to evaluate the performance of the fuzzy assessment in Veins. The fuzzy and fixed RSU assessment schemes are compared, and the results show that the fuzzy scheme provides better overall driver behaviour. The Markov chain driver behaviour model is also examined when changing the initial trust score of all drivers

    Learning recommender systems from biased user interactions

    Get PDF
    Recommender systems have been widely deployed to help users quickly find what they need from a collection of items. Predominant recommendation methods rely on supervised learning models to predict user ratings on items or the probabilities of users interacting with items. In addition, reinforcement learning models are crucial in improving long-term user engagement within recommender systems. In practice, both of these recommendation methods are commonly trained on logged user interactions and, therefore, subject to bias present in logged user interactions. This thesis concerns complex forms of bias in real-world user behaviors and aims to mitigate the effect of bias on reinforcement learning-based recommendation methods. The first part of the thesis consists of two research chapters, each dedicated to tackling a specific form of bias: dynamic selection bias and multifactorial bias. To mitigate the effect of dynamic selection bias and multifactorial bias, we propose a bias propensity estimation method for each. By incorporating the results from the bias propensity estimation methods, the widely used inverse propensity scoring-based debiasing method can be extended to correct for the corresponding bias. The second part of the thesis consists of two chapters that concern the effect of bias on reinforcement learning-based recommendation methods. Its first chapter focuses on mitigating the effect of bias on simulators, which enables the learning and evaluation of reinforcement learning-based recommendation methods. Its second chapter further explores different state encoders for reinforcement learning-based recommendation methods when learning and evaluating with the proposed debiased simulator

    Simulation-based test case generation for unmanned aerial vehicles in the neighborhood of real flights

    Get PDF
    Unmanned aerial vehicles (UAVs), also known as drones, are acquiring increasing autonomy. With their commercial adoption, the problem of testing their functional and non-functional, and in particular their safety requirements has become a critical concern. Simulation-based testing represents a fundamental practice, but the testing scenarios considered in software-in-the-loop testing may not be representative of the actual scenarios experienced in the field. In this paper, we propose SURREAL (teSting Uavs in the neighboRhood of REAl fLights), a novel search-based approach that analyses logs of real UAV flights and automatically generates simulation-based tests in the neighborhood of such real flights, thereby improving the realism and representativeness of the simulation-based tests. This is done in two steps: first, SURREAL faithfully replicates the given UAV flight in the simulation environment, generating a simulation-based test that mirrors a pre-logged real-world behavior. Then, it smoothly manipulates the replicated flight conditions to discover slightly modified flight scenarios that are challenging or trigger misbehaviors of the UAV under test in simulation. In our experiments, we were able to replicate a real flight accurately in the simulation environment and to expose unstable and potentially unsafe behavior in the neighborhood of a flight, which even led to crashes

    What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?

    Full text link
    Offline goal-conditioned RL (GCRL) offers a way to train general-purpose agents from fully offline datasets. In addition to being conservative within the dataset, the generalization ability to achieve unseen goals is another fundamental challenge for offline GCRL. However, to the best of our knowledge, this problem has not been well studied yet. In this paper, we study out-of-distribution (OOD) generalization of offline GCRL both theoretically and empirically to identify factors that are important. In a number of experiments, we observe that weighted imitation learning enjoys better generalization than pessimism-based offline RL method. Based on this insight, we derive a theory for OOD generalization, which characterizes several important design choices. We then propose a new offline GCRL method, Generalizable Offline goAl-condiTioned RL (GOAT), by combining the findings from our theoretical and empirical studies. On a new benchmark containing 9 independent identically distributed (IID) tasks and 17 OOD tasks, GOAT outperforms current state-of-the-art methods by a large margin.Comment: Accepted by Proceedings of the 40th International Conference on Machine Learning, 202
    • 

    corecore