326 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
AI: Limits and Prospects of Artificial Intelligence
The emergence of artificial intelligence has triggered enthusiasm and promise of boundless opportunities as much as uncertainty about its limits. The contributions to this volume explore the limits of AI, describe the necessary conditions for its functionality, reveal its attendant technical and social problems, and present some existing and potential solutions. At the same time, the contributors highlight the societal and attending economic hopes and fears, utopias and dystopias that are associated with the current and future development of artificial intelligence
Advances and Applications of DSmT for Information Fusion. Collected Works, Volume 5
This fifth volume on Advances and Applications of DSmT for Information Fusion collects theoretical and applied contributions of researchers working in different fields of applications and in mathematics, and is available in open-access. The collected contributions of this volume have either been published or presented after disseminating the fourth volume in 2015 in international conferences, seminars, workshops and journals, or they are new. The contributions of each part of this volume are chronologically ordered.
First Part of this book presents some theoretical advances on DSmT, dealing mainly with modified Proportional Conflict Redistribution Rules (PCR) of combination with degree of intersection, coarsening techniques, interval calculus for PCR thanks to set inversion via interval analysis (SIVIA), rough set classifiers, canonical decomposition of dichotomous belief functions, fast PCR fusion, fast inter-criteria analysis with PCR, and improved PCR5 and PCR6 rules preserving the (quasi-)neutrality of (quasi-)vacuous belief assignment in the fusion of sources of evidence with their Matlab codes.
Because more applications of DSmT have emerged in the past years since the apparition of the fourth book of DSmT in 2015, the second part of this volume is about selected applications of DSmT mainly in building change detection, object recognition, quality of data association in tracking, perception in robotics, risk assessment for torrent protection and multi-criteria decision-making, multi-modal image fusion, coarsening techniques, recommender system, levee characterization and assessment, human heading perception, trust assessment, robotics, biometrics, failure detection, GPS systems, inter-criteria analysis, group decision, human activity recognition, storm prediction, data association for autonomous vehicles, identification of maritime vessels, fusion of support vector machines (SVM), Silx-Furtif RUST code library for information fusion including PCR rules, and network for ship classification.
Finally, the third part presents interesting contributions related to belief functions in general published or presented along the years since 2015. These contributions are related with decision-making under uncertainty, belief approximations, probability transformations, new distances between belief functions, non-classical multi-criteria decision-making problems with belief functions, generalization of Bayes theorem, image processing, data association, entropy and cross-entropy measures, fuzzy evidence numbers, negator of belief mass, human activity recognition, information fusion for breast cancer therapy, imbalanced data classification, and hybrid techniques mixing deep learning with belief functions as well
Supervised Learning in Time-dependent Environments with Performance Guarantees
In practical scenarios, it is common to learn from a sequence of related problems (tasks).
Such tasks are usually time-dependent in the sense that consecutive tasks are often
significantly more similar. Time-dependency is common in multiple applications such
as load forecasting, spam main filtering, and face emotion recognition. For instance, in
the problem of load forecasting, the consumption patterns in consecutive time periods
are significantly more similar since human habits and weather factors change gradually
over time. Learning from a sequence tasks holds promise to enable accurate performance
even with few samples per task by leveraging information from different tasks. However,
harnessing the benefits of learning from a sequence of tasks is challenging since tasks
are characterized by different underlying distributions.
Most existing techniques are designed for situations where the tasks’ similarities
do not depend on their order in the sequence. Existing techniques designed for timedependent
tasks adapt to changes between consecutive tasks accounting for a scalar
rate of change by using a carefully chosen parameter such as a learning rate or a weight
factor. However, the tasks’ changes are commonly multidimensional, i.e., the timedependency
often varies across different statistical characteristics describing the tasks.
For instance, in the problem of load forecasting, the statistical characteristics related
to weather factors often change differently from those related to generation.
In this dissertation, we establish methodologies for supervised learning from a sequence
of time-dependent tasks that effectively exploit information from all tasks,
provide multidimensional adaptation to tasks’ changes, and provide computable tight
performance guarantees. We develop methods for supervised learning settings where
tasks arrive over time including techniques for supervised classification under concept
drift (SCD) and techniques for continual learning (CL). In addition, we present techniques
for load forecasting that can adapt to time changes in consumption patterns
and assess intrinsic uncertainties in load demand. The numerical results show that the
proposed methodologies can significantly improve the performance of existing methods
using multiple benchmark datasets. This dissertation makes theoretical contributions
leading to efficient algorithms for multiple machine learning scenarios that provide computable
performance guarantees and superior performance than state-of-the-art techniques
Reinforcement Learning Curricula as Interpolations between Task Distributions
In the last decade, the increased availability of powerful computing machinery has led to an increasingly widespread application of machine learning methods. Machine learning has been particularly successful when large models, typically neural networks with an ever-increasing number of parameters, can leverage vast data to make predictions.
While reinforcement learning (RL) has been no exception from this development, a distinguishing feature of RL is its well-known exploration-exploitation trade-off, whose optimal solution – while possible to model as a partially observable Markov decision process – evades computation in all but the simplest problems. Consequently, it seems unsurprising that notable demonstrations of reinforcement learning, such as an RL-based Go agent (AlphaGo) by Deepmind beating the professional Go player Lee Sedol, relied both on the availability of massive computing capabilities and specific forms of regularization that facilitate learning. In the case of AlphaGo, this regularization came in the form of self-play, enabling learning by interacting with gradually more proficient opponents.
In this thesis, we develop techniques that, similarly to the concept of self-play of AlphaGo, improve the learning performance of RL agents by training on sequences of increasingly complex tasks. These task sequences are typically called curricula and are known to side-step problems such as slow learning or convergence to poor behavior that may occur when directly learning in complicated tasks. The algorithms we develop in this thesis create curricula by minimizing distances or divergences between probability distributions of learning tasks, generating interpolations between an initial distribution of easy learning tasks and a target task distribution. Apart from improving the learning performance of RL agents in experiments, developing methods that realize curricula as interpolations between task distributions results in a nuanced picture of key aspects of successful reinforcement learning curricula.
In Chapter 1, we start this thesis by introducing required reinforcement learning notation and then motivating curriculum reinforcement learning from the perspective of continuation methods for non-linear optimization. Similar to curricula for reinforcement learning agents, continuation methods have been used in non-linear optimization to solve challenging optimization problems. This similarity provides an intuition about the effect of the curricula we aim to generate and their limits.
In Chapter 2, we transfer the concept of self-paced learning, initially proposed in the supervised learning community, to the problem of RL, showing that an automated curriculum generation for RL agents can be motivated by a regularized RL objective. This regularized RL objective implies generating a curriculum as a sequence of task distributions that trade off the expected agent performance against similarity to a specified distribution of target tasks. This view on curriculum RL contrasts existing approaches, as it motivates curricula via a regularized RL objective instead of generating them from a set of assumptions about an optimal curriculum. In experiments, we show that an approximate implementation of the aforementioned curriculum – that restricts the interpolating task distribution to a Gaussian – results in improved learning performance compared to regular reinforcement learning, matching or surpassing the performance of existing curriculum-based methods.
Subsequently, Chapter 3 builds up on the intuition of curricula as sequences of interpolating task distributions established in Chapter 2. Motivated by using more flexible task distribution representations, we show how parametric assumptions play a crucial role in the empirical success of the previous approach and subsequently uncover key ingredients that enable the generation of meaningful curricula without assuming a parametric model of the task distributions. One major ingredient is an explicit notion of task similarity via a distance function of two Markov Decision Processes. We turn towards optimal transport theory, allowing for flexible particle-based representations of the task distributions while properly considering the newly introduced metric structure of the task space. Combined with other improvements to our first method, such as a more aggressive restriction of the curriculum to tasks that are not too hard for the agent, the resulting approach delivers consistently high learning performance in multiple experiments.
In the final Chapter 4, we apply the refined method of Chapter 3 to a trajectory-tracking task, in which we task an RL agent to follow a three-dimensional reference trajectory with the tip of an inverted pendulum mounted on a Barrett Whole Arm Manipulator. The access to only positional information results in a partially observable system that, paired with its inherent instability, underactuation, and non-trivial kinematic structure, presents a challenge for modern reinforcement learning algorithms, which we tackle via curricula. The technically infinite-dimensional task space of target trajectories allows us to probe the developed curriculum learning method for flaws that have not surfaced in the rather low-dimensional experiments of the previous chapters. Through an improved optimization scheme that better respects the non-Euclidean structure of target trajectories, we reliably generate curricula of trajectories to be tracked, resulting in faster and more robust learning compared to an RL baseline that does not exploit this form of structured learning. The learned policy matches the performance of an optimal control baseline on the real system, demonstrating the potential of curriculum RL to learn state estimation and control for non-linear tracking tasks jointly.
In summary, this thesis introduces a perspective on reinforcement learning curricula as interpolations between task distributions. The methods developed under this perspective enjoy a precise formulation as optimization problems and deliver empirical benefits throughout experiments. Building upon this precise formulation may allow future work to advance the formal understanding of reinforcement learning curricula and, with that, enable the solution of challenging decision-making and control problems with reinforcement learning
Improved Online Conformal Prediction via Strongly Adaptive Online Learning
We study the problem of uncertainty quantification via prediction sets, in an
online setting where the data distribution may vary arbitrarily over time.
Recent work develops online conformal prediction techniques that leverage
regret minimization algorithms from the online learning literature to learn
prediction sets with approximately valid coverage and small regret. However,
standard regret minimization could be insufficient for handling changing
environments, where performance guarantees may be desired not only over the
full time horizon but also in all (sub-)intervals of time. We develop new
online conformal prediction methods that minimize the strongly adaptive regret,
which measures the worst-case regret over all intervals of a fixed length. We
prove that our methods achieve near-optimal strongly adaptive regret for all
interval lengths simultaneously, and approximately valid coverage. Experiments
show that our methods consistently obtain better coverage and smaller
prediction sets than existing methods on real-world tasks, such as time series
forecasting and image classification under distribution shift
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Supervised learning in time-dependent environments with performance guarantees
151 p.En esta tesis, establecemos metodologías para el aprendizaje supervisado a partir de una secuencia de tareas dependientes del tiempo que explotan eficazmente la información de todas las tareas, proporcionan una adaptación multidimensional a los cambios de tareas y ofrecen garantías de rendimiento ajustadas y computables. Desarrollamos métodos para entornos de aprendizaje supervisado en los que las tareas llegan a lo largo del tiempo, incluidas técnicas de clasificación supervisada bajo concept drift y técnicas de continual learning. Además, presentamos técnicas de previsión de la demanda de energía que pueden adaptarse a los cambios temporales en los patrones de consumo y evaluar las incertidumbres intrínsecas de la demanda de carga. Los resultados numéricos muestran que las metodologías propuestas pueden mejorar significativamente el rendimiento de los métodos existentes utilizando múltiples conjuntos de datos de referencia. Esta tesis hace contribuciones teóricas que conducen a algoritmos eficientes para múltiples escenarios de aprendizaje automático que proporcionan garantías de rendimiento computables y un rendimiento superior al de las técnicas más avanzadas
Leveraging Value-awareness for Online and Offline Model-based Reinforcement Learning
Model-based Reinforcement Learning (RL) lies at the intersection of planning and learning for sequential decision making. Value-awareness in model learning has recently emerged as a means to imbue task or reward information into the objective of model learn- ing, in order for the model to leverage specificity of a task. While finding success in theory as being superior to maximum likelihood estimation in the context of (online) model-based RL, value-awareness has remained impractical for most non-trivial tasks.
This thesis aims to bridge the gap in theory and practice by applying the principle of value-awareness to two settings – the online RL setting and offline RL setting. First, within online RL, this thesis revisits value-aware model learning from the perspective of minimizing performance difference, obtaining a novel value-aware model learning objec- tive as a direct upper bound of it. Then, this thesis investigates and remedies the issue of stale value estimates that has so far been holding back the practicality of value-aware model learning. Using the proposed remedy, performance improvements are presented over maximum-likelihood based baselines and existing value-aware objectives, in several continuous control tasks, while also enabling existing value-aware objectives to become performant.
In the offline RL context, this thesis takes a step back from model learning and ap- plies value-awareness towards better data augmentation. Such data augmentation, when applied to model-based offline RL algorithms, allows for leveraging unseen states with low epistemic uncertainty that have previously not been reachable within the assumptions and limitations of model-based offline RL. Value-aware state augmentations are found to enable better performance on offline RL benchmarks compared to existing baselines and non-value-aware alternatives.Ph.D
Dr. Neurosymbolic, or: How I Learned to Stop Worrying and Accept Statistics
The symbolic AI community is increasingly trying to embrace machine learning
in neuro-symbolic architectures, yet is still struggling due to cultural
barriers. To break the barrier, this rather opinionated personal memo attempts
to explain and rectify the conventions in Statistics, Machine Learning, and
Deep Learning from the viewpoint of outsiders. It provides a step-by-step
protocol for designing a machine learning system that satisfies a minimum
theoretical guarantee necessary for being taken seriously by the symbolic AI
community, i.e., it discusses "in what condition we can stop worrying and
accept statistical machine learning." Unlike most textbooks which are written
for students trying to specialize in Stat/ML/DL and willing to accept jargons,
this memo is written for experienced symbolic researchers that hear a lot of
buzz but are still uncertain and skeptical. Information on Stat/ML/DL is
currently too scattered or too noisy to invest in. This memo prioritizes
compactness, citations to old papers (many in early 20th century), and concepts
that resonate well with symbolic paradigms in order to offer time savings. It
prioritizes general mathematical modeling and does not discuss any specific
function approximator, such as neural networks (NNs), SVMs, decision trees,
etc. Finally, it is open to corrections. Consider this memo as something
similar to a blog post taking the form of a paper on Arxiv.Comment: 12 pages of main contents, 29 pages in total. It could also serve as
an accompanying material for Latplan paper. (arXiv:2107.00110) v2: rewrote
the general ELBO derivation without Prolog. v3: significantly extended the
Bayesian reasoning section in the appendix, with several proofs for conjugate
priors. v4+: errata fi
- …