143,675 research outputs found
Data exploration with learning metrics
A crucial problem in exploratory analysis of data is that it is difficult for computational methods to focus on interesting aspects of data. Traditional methods of unsupervised learning cannot differentiate between interesting and noninteresting variation, and hence may model, visualize, or cluster parts of data that are not interesting to the analyst. This wastes the computational power of the methods and may mislead the analyst.
In this thesis, a principle called "learning metrics" is used to develop visualization and clustering methods that automatically focus on the interesting aspects, based on auxiliary labels supplied with the data samples. The principle yields non-Euclidean (Riemannian) metrics that are data-driven, widely applicable, versatile, invariant to many transformations, and in part invariant to noise.
Learning metric methods are introduced for five tasks: nonlinear visualization by Self-Organizing Maps and Multidimensional Scaling, linear projection, and clustering of discrete data and multinomial distributions. The resulting methods either explicitly estimate distances in the Riemannian metric, or optimize a tailored cost function which is implicitly related to such a metric. The methods have rigorous theoretical relationships to information geometry and probabilistic modeling, and are empirically shown to yield good practical results in exploratory and information retrieval tasks.reviewe
Flexible Group Fairness Metrics for Survival Analysis
Algorithmic fairness is an increasingly important field concerned with
detecting and mitigating biases in machine learning models. There has been a
wealth of literature for algorithmic fairness in regression and classification
however there has been little exploration of the field for survival analysis.
Survival analysis is the prediction task in which one attempts to predict the
probability of an event occurring over time. Survival predictions are
particularly important in sensitive settings such as when utilising machine
learning for diagnosis and prognosis of patients. In this paper we explore how
to utilise existing survival metrics to measure bias with group fairness
metrics. We explore this in an empirical experiment with 29 survival datasets
and 8 measures. We find that measures of discrimination are able to capture
bias well whereas there is less clarity with measures of calibration and
scoring rules. We suggest further areas for research including prediction-based
fairness metrics for distribution predictions.Comment: Accepted in DSHealth 2022 (Workshop on Applied Data Science for
Healthcare
Comparing Active Learning Performance Driven by Gaussian Processes or Bayesian Neural Networks for Constrained Trajectory Exploration
Robots with increasing autonomy progress our space exploration capabilities,
particularly for in-situ exploration and sampling to stand in for human
explorers. Currently, humans drive robots to meet scientific objectives, but
depending on the robot's location, the exchange of information and driving
commands between the human operator and robot may cause undue delays in mission
fulfillment. An autonomous robot encoded with a scientific objective and an
exploration strategy incurs no communication delays and can fulfill missions
more quickly. Active learning algorithms offer this capability of intelligent
exploration, but the underlying model structure varies the performance of the
active learning algorithm in accurately forming an understanding of the
environment. In this paper, we investigate the performance differences between
active learning algorithms driven by Gaussian processes or Bayesian neural
networks for exploration strategies encoded on agents that are constrained in
their trajectories, like planetary surface rovers. These two active learning
strategies were tested in a simulation environment against science-blind
strategies to predict the spatial distribution of a variable of interest along
multiple datasets. The performance metrics of interest are model accuracy in
root mean squared (RMS) error, training time, model convergence, total distance
traveled until convergence, and total samples until convergence. Active
learning strategies encoded with Gaussian processes require less computation to
train, converge to an accurate model more quickly, and propose trajectories of
shorter distance, except in a few complex environments in which Bayesian neural
networks achieve a more accurate model in the large data regime due to their
more expressive functional bases. The paper concludes with advice on when and
how to implement either exploration strategy for future space missions.Comment: AIAA ASCEND 2023, 15 page
Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning
Fine tuning distributed systems is considered to be a craftsmanship, relying
on intuition and experience. This becomes even more challenging when the
systems need to react in near real time, as streaming engines have to do to
maintain pre-agreed service quality metrics. In this article, we present an
automated approach that builds on a combination of supervised and reinforcement
learning methods to recommend the most appropriate lever configurations based
on previous load. With this, streaming engines can be automatically tuned
without requiring a human to determine the right way and proper time to deploy
them. This opens the door to new configurations that are not being applied
today since the complexity of managing these systems has surpassed the
abilities of human experts. We show how reinforcement learning systems can find
substantially better configurations in less time than their human counterparts
and adapt to changing workloads
- …