502 research outputs found
Robustness - a challenge also for the 21st century: A review of robustness phenomena in technical, biological and social systems as well as robust approaches in engineering, computer science, operations research and decision aiding
Notions on robustness exist in many facets. They come from different disciplines and reflect different worldviews. Consequently, they contradict each other very often, which makes the term less applicable in a general context. Robustness approaches are often limited to specific problems for which they have been developed. This means, notions and definitions might reveal to be wrong if put into another domain of validity, i.e. context. A definition might be correct in a specific context but need not hold in another. Therefore, in order to be able to speak of robustness we need to specify the domain of validity, i.e. system, property and uncertainty of interest. As proofed by Ho et al. in an optimization context with finite and discrete domains, without prior knowledge about the problem there exists no solution what so ever which is more robust than any other. Similar to the results of the No Free Lunch Theorems of Optimization (NLFTs) we have to exploit the problem structure in order to make a solution more robust. This optimization problem is directly linked to a robustness/fragility tradeoff which has been observed in many contexts, e.g. 'robust, yet fragile' property of HOT (Highly Optimized Tolerance) systems. Another issue is that robustness is tightly bounded to other phenomena like complexity for which themselves exist no clear definition or theoretical framework. Consequently, this review rather tries to find common aspects within many different approaches and phenomena than to build a general theorem for robustness, which anyhow might not exist because complex phenomena often need to be described from a pluralistic view to address as many aspects of a phenomenon as possible. First, many different robustness problems have been reviewed from many different disciplines. Second, different common aspects will be discussed, in particular the relationship of functional and structural properties. This paper argues that robustness phenomena are also a challenge for the 21st century. It is a useful quality of a model or system in terms of the 'maintenance of some desired system characteristics despite fluctuations in the behaviour of its component parts or its environment' (s. [Carlson and Doyle, 2002], p. 2). We define robustness phenomena as solution with balanced tradeoffs and robust design principles and robustness measures as means to balance tradeoffs. --
Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback
In this work, we study the low-rank MDPs with adversarially changed losses in
the full-information feedback setting. In particular, the unknown transition
probability kernel admits a low-rank matrix decomposition \citep{REPUCB22}, and
the loss functions may change adversarially but are revealed to the learner at
the end of each episode. We propose a policy optimization-based algorithm POLO,
and we prove that it attains the
regret
guarantee, where is rank of the transition kernel (and hence the dimension
of the unknown representations), is the cardinality of the action space,
is the cardinality of the model class, and is the discounted
factor. Notably, our algorithm is oracle-efficient and has a regret guarantee
with no dependence on the size of potentially arbitrarily large state space.
Furthermore, we also prove an
regret lower bound for this problem, showing that low-rank MDPs are
statistically more difficult to learn than linear MDPs in the regret
minimization setting. To the best of our knowledge, we present the first
algorithm that interleaves representation learning, exploration, and
exploitation to achieve the sublinear regret guarantee for RL with nonlinear
function approximation and adversarial losses
Universal Algorithmic Intelligence: A mathematical top->down approach
Sequential decision theory formally solves the problem of rational agents in
uncertain worlds if the true environmental prior probability distribution is
known. Solomonoff's theory of universal induction formally solves the problem
of sequence prediction for unknown prior distribution. We combine both ideas
and get a parameter-free theory of universal Artificial Intelligence. We give
strong arguments that the resulting AIXI model is the most intelligent unbiased
agent possible. We outline how the AIXI model can formally solve a number of
problem classes, including sequence prediction, strategic games, function
minimization, reinforcement and supervised learning. The major drawback of the
AIXI model is that it is uncomputable. To overcome this problem, we construct a
modified algorithm AIXItl that is still effectively more intelligent than any
other time t and length l bounded agent. The computation time of AIXItl is of
the order t x 2^l. The discussion includes formal definitions of intelligence
order relations, the horizon problem and relations of the AIXI theory to other
AI approaches.Comment: 70 page
A survey on online active learning
Online active learning is a paradigm in machine learning that aims to select
the most informative data points to label from a data stream. The problem of
minimizing the cost associated with collecting labeled observations has gained
a lot of attention in recent years, particularly in real-world applications
where data is only available in an unlabeled form. Annotating each observation
can be time-consuming and costly, making it difficult to obtain large amounts
of labeled data. To overcome this issue, many active learning strategies have
been proposed in the last decades, aiming to select the most informative
observations for labeling in order to improve the performance of machine
learning models. These approaches can be broadly divided into two categories:
static pool-based and stream-based active learning. Pool-based active learning
involves selecting a subset of observations from a closed pool of unlabeled
data, and it has been the focus of many surveys and literature reviews.
However, the growing availability of data streams has led to an increase in the
number of approaches that focus on online active learning, which involves
continuously selecting and labeling observations as they arrive in a stream.
This work aims to provide an overview of the most recently proposed approaches
for selecting the most informative observations from data streams in the
context of online active learning. We review the various techniques that have
been proposed and discuss their strengths and limitations, as well as the
challenges and opportunities that exist in this area of research. Our review
aims to provide a comprehensive and up-to-date overview of the field and to
highlight directions for future work
Gamification of Pure Exploration for Linear Bandits
We investigate an active pure-exploration setting, that includes best-arm
identification, in the context of linear stochastic bandits. While
asymptotically optimal algorithms exist for standard multi-arm bandits, the
existence of such algorithms for the best-arm identification in linear bandits
has been elusive despite several attempts to address it. First, we provide a
thorough comparison and new insight over different notions of optimality in the
linear case, including G-optimality, transductive optimality from optimal
experimental design and asymptotic optimality. Second, we design the first
asymptotically optimal algorithm for fixed-confidence pure exploration in
linear bandits. As a consequence, our algorithm naturally bypasses the pitfall
caused by a simple but difficult instance, that most prior algorithms had to be
engineered to deal with explicitly. Finally, we avoid the need to fully solve
an optimal design problem by providing an approach that entails an efficient
implementation.Comment: 11+25 pages. To be published in the proceedings of ICML 202
What Happens in the Field Stays in the Field: Exploring Whether Professionals Play Minimax in Laboratory Experiments
The minimax argument represents game theory in its most elegant form: simple but with stark predictions. Although some of these predictions have been met with reasonable success in the field, experimental data have generally not provided results close to the theoretical predictions. In a striking study, Palacios-Huerta and Volij (2007) present evidence that potentially resolves this puzzle: both amateur and professional soccer players play nearly exact minimax strategies in laboratory experiments. In this paper, we establish important bounds on these results by examining the behavior of four distinct subject pools: college students, bridge professionals, world-class poker players, who have vast experience with high-stakes randomization in card games, and American professional soccer players. In contrast to Palacios-Huerta and Volijâs results, we find little evidence that real-world experience transfers to the lab in these gamesâindeed, similar to previous experimental results, all four subject pools provide choices that are generally not close to minimax predictions. We use two additional pieces of evidence to explore why professionals do not perform well in the lab: (1) complementary experimental treatments that pit professionals against preprogrammed computers, and (2) post-experiment questionnaires. The most likely explanation is that these professionals are unable to transfer their skills at randomization from the familiar context of the field to the unfamiliar context of the lab.
- âŠ