197 research outputs found

    Affinity-Based Reinforcement Learning : A New Paradigm for Agent Interpretability

    Get PDF
    The steady increase in complexity of reinforcement learning (RL) algorithms is accompanied by a corresponding increase in opacity that obfuscates insights into their devised strategies. Methods in explainable artificial intelligence seek to mitigate this opacity by either creating transparent algorithms or extracting explanations post hoc. A third category exists that allows the developer to affect what agents learn: constrained RL has been used in safety-critical applications and prohibits agents from visiting certain states; preference-based RL agents have been used in robotics applications and learn state-action preferences instead of traditional reward functions. We propose a new affinity-based RL paradigm in which agents learn strategies that are partially decoupled from reward functions. Unlike entropy regularisation, we regularise the objective function with a distinct action distribution that represents a desired behaviour; we encourage the agent to act according to a prior while learning to maximise rewards. The result is an inherently interpretable agent that solves problems with an intrinsic affinity for certain actions. We demonstrate the utility of our method in a financial application: we learn continuous time-variant compositions of prototypical policies, each interpretable by its action affinities, that are globally interpretable according to customers’ financial personalities. Our method combines advantages from both constrained RL and preferencebased RL: it retains the reward function but generalises the policy to match a defined behaviour, thus avoiding problems such as reward shaping and hacking. Unlike Boolean task composition, our method is a fuzzy superposition of different prototypical strategies to arrive at a more complex, yet interpretable, strategy.publishedVersio

    A systematic review of machine learning models for management, prediction and classification of ARDS

    Get PDF
    Aim: Acute respiratory distress syndrome or ARDS is an acute, severe form of respiratory failure characterised by poor oxygenation and bilateral pulmonary infiltrates. Advancements in signal processing and machine learning have led to promising solutions for classification, event detection and predictive models in the management of ARDS. Method: In this review, we provide systematic description of different studies in the application of Machine Learning (ML) and artificial intelligence for management, prediction, and classification of ARDS. We searched the following databases: Google Scholar, PubMed, and EBSCO from 2009 to 2023. A total of 243 studies was screened, in which, 52 studies were included for review and analysis. We integrated knowledge of previous work providing the state of art and overview of explainable decision models in machine learning and have identified areas for future research. Results: Gradient boosting is the most common and successful method utilised in 12 (23.1%) of the studies. Due to limitation of data size available, neural network and its variation is used by only 8 (15.4%) studies. Whilst all studies used cross validating technique or separated database for validation, only 1 study validated the model with clinician input. Explainability methods were presented in 15 (28.8%) of studies with the most common method is feature importance which used 14 times. Conclusion: For databases of 5000 or fewer samples, extreme gradient boosting has the highest probability of success. A large, multi-region, multi centre database is required to reduce bias and take advantage of neural network method. A framework for validating with and explaining ML model to clinicians involved in the management of ARDS would be very helpful for development and deployment of the ML model

    Integrated Machine Learning and Optimization Frameworks with Applications in Operations Management

    Full text link
    Incorporation of contextual inference in the optimality analysis of operational problems is a canonical characteristic of data-informed decision making that requires interdisciplinary research. In an attempt to achieve individualization in operations management, we design rigorous and yet practical mechanisms that boost efficiency, restrain uncertainty and elevate real-time decision making through integration of ideas from machine learning and operations research literature. In our first study, we investigate the decision of whether to admit a patient to a critical care unit which is a crucial operational problem that has significant influence on both hospital performance and patient outcomes. Hospitals currently lack a methodology to selectively admit patients to these units in a way that patient’s individual health metrics can be incorporated while considering the hospital’s operational constraints. We model the problem as a complex loss queueing network with a stochastic model of how long risk-stratified patients spend time in particular units and how they transition between units. A data-driven optimization methodology then approximates an optimal admission control policy for the network of units. While enforcing low levels of patient blocking, we optimize a monotonic dual-threshold admission policy. Our methodology captures utilization and accessibility in a network model of care pathways while supporting the personalized allocation of scarce care resources to the neediest patients. The interesting benefits of admission thresholds that vary by day of week are also examined. In the second study, we analyze the efficiency of surgical unit operations in the era of big data. The accuracy of surgical case duration predictions is a crucial element in hospital operational performance. We propose a comprehensive methodology that incorporates both structured and unstructured data to generate individualized predictions regarding the overall distribution of surgery durations. Consequently, we investigate methods to incorporate such individualized predictions into operational decision-making. We introduce novel prescriptive models to address optimization under uncertainty in the fundamental surgery appointment scheduling problem by utilizing the multi-dimensional data features available prior to the surgery. Electronic medical records systems provide detailed patient features that enable the prediction of individualized case time distributions; however, existing approaches in this context usually employ only limited, aggregate information, and do not take advantages of these detailed features. We show how the quantile regression forest, can be integrated into three common optimization formulations that capture the stochasticity in addressing this problem, including stochastic optimization, robust optimization and distributionally robust optimization. In the last part of this dissertation, we provide the first study on online learning problems under stochastic constraints that are "soft", i.e., need to be satisfied with high likelihood. Under a Bayesian framework, we propose and analyze a scheme that provides statistical feasibility guarantees throughout the learning horizon, by using posterior Monte Carlo samples to form sampled constraints that generalize the scenario generation approach commonly used in chance-constrained programming. We demonstrate how our scheme can be integrated into Thompson sampling and illustrate it with an application in online advertisement.PHDIndustrial & Operations EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145936/1/meisami_1.pd

    WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM

    Get PDF
    Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments
    • …
    corecore