143,675 research outputs found

    Data exploration with learning metrics

    Get PDF
    A crucial problem in exploratory analysis of data is that it is difficult for computational methods to focus on interesting aspects of data. Traditional methods of unsupervised learning cannot differentiate between interesting and noninteresting variation, and hence may model, visualize, or cluster parts of data that are not interesting to the analyst. This wastes the computational power of the methods and may mislead the analyst. In this thesis, a principle called "learning metrics" is used to develop visualization and clustering methods that automatically focus on the interesting aspects, based on auxiliary labels supplied with the data samples. The principle yields non-Euclidean (Riemannian) metrics that are data-driven, widely applicable, versatile, invariant to many transformations, and in part invariant to noise. Learning metric methods are introduced for five tasks: nonlinear visualization by Self-Organizing Maps and Multidimensional Scaling, linear projection, and clustering of discrete data and multinomial distributions. The resulting methods either explicitly estimate distances in the Riemannian metric, or optimize a tailored cost function which is implicitly related to such a metric. The methods have rigorous theoretical relationships to information geometry and probabilistic modeling, and are empirically shown to yield good practical results in exploratory and information retrieval tasks.reviewe

    Flexible Group Fairness Metrics for Survival Analysis

    Full text link
    Algorithmic fairness is an increasingly important field concerned with detecting and mitigating biases in machine learning models. There has been a wealth of literature for algorithmic fairness in regression and classification however there has been little exploration of the field for survival analysis. Survival analysis is the prediction task in which one attempts to predict the probability of an event occurring over time. Survival predictions are particularly important in sensitive settings such as when utilising machine learning for diagnosis and prognosis of patients. In this paper we explore how to utilise existing survival metrics to measure bias with group fairness metrics. We explore this in an empirical experiment with 29 survival datasets and 8 measures. We find that measures of discrimination are able to capture bias well whereas there is less clarity with measures of calibration and scoring rules. We suggest further areas for research including prediction-based fairness metrics for distribution predictions.Comment: Accepted in DSHealth 2022 (Workshop on Applied Data Science for Healthcare

    Comparing Active Learning Performance Driven by Gaussian Processes or Bayesian Neural Networks for Constrained Trajectory Exploration

    Full text link
    Robots with increasing autonomy progress our space exploration capabilities, particularly for in-situ exploration and sampling to stand in for human explorers. Currently, humans drive robots to meet scientific objectives, but depending on the robot's location, the exchange of information and driving commands between the human operator and robot may cause undue delays in mission fulfillment. An autonomous robot encoded with a scientific objective and an exploration strategy incurs no communication delays and can fulfill missions more quickly. Active learning algorithms offer this capability of intelligent exploration, but the underlying model structure varies the performance of the active learning algorithm in accurately forming an understanding of the environment. In this paper, we investigate the performance differences between active learning algorithms driven by Gaussian processes or Bayesian neural networks for exploration strategies encoded on agents that are constrained in their trajectories, like planetary surface rovers. These two active learning strategies were tested in a simulation environment against science-blind strategies to predict the spatial distribution of a variable of interest along multiple datasets. The performance metrics of interest are model accuracy in root mean squared (RMS) error, training time, model convergence, total distance traveled until convergence, and total samples until convergence. Active learning strategies encoded with Gaussian processes require less computation to train, converge to an accurate model more quickly, and propose trajectories of shorter distance, except in a few complex environments in which Bayesian neural networks achieve a more accurate model in the large data regime due to their more expressive functional bases. The paper concludes with advice on when and how to implement either exploration strategy for future space missions.Comment: AIAA ASCEND 2023, 15 page

    Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning

    Get PDF
    Fine tuning distributed systems is considered to be a craftsmanship, relying on intuition and experience. This becomes even more challenging when the systems need to react in near real time, as streaming engines have to do to maintain pre-agreed service quality metrics. In this article, we present an automated approach that builds on a combination of supervised and reinforcement learning methods to recommend the most appropriate lever configurations based on previous load. With this, streaming engines can be automatically tuned without requiring a human to determine the right way and proper time to deploy them. This opens the door to new configurations that are not being applied today since the complexity of managing these systems has surpassed the abilities of human experts. We show how reinforcement learning systems can find substantially better configurations in less time than their human counterparts and adapt to changing workloads
    • …
    corecore