67,143 research outputs found
A Direct Reputation Model for VO Formation
We show that reputation is a basic ingredient in the Virtual Organisation (VO) formation process. Agents can use their experiences gained in direct past interactions to model other’s reputation and deciding on either join a VO or determining who is the most suitable set of partners. Reputation values are computed using a reinforcement learning algorithm, so agents can learn and adapt their reputation models of their partners according to their recent behaviour. Our approach is especially powerful if the agent participates in a VO in which the members can change their behaviour to exploit their partners. The reputation model presented in this paper deals with the questions of deception and fraud that have been ignored in current models of VO formation
The Influence of Regret Proneness, Evidence Strengthening, and Perceived Responsibility on Verdict Preference
In the present study, we investigated perceived responsibility, evidence strengthening, and defendant gender in the context of a criminal trial involving DNA. Evidence was introduced post-trial and varied as strengthening the defendant’s guilt v. innocence. We also examined perceptions of perceived responsibility for verdict in order to more closely evaluate the role of regret in decision-making. Results indicated that DNA evidence is perceived as reliable, regardless of whether it strengthened guilt or innocence. In addition, greater confidence in verdict was observed when evidence strengthened the guilt of a female defendant vs. a male defendant. Finally, jurors experiencing high levels of regret perceived DNA evidence more selectively compared to jurors with low levels of regret, supporting the importance of identifying individual difference factors prior to trial
An Efficient Bandit Algorithm for Realtime Multivariate Optimization
Optimization is commonly employed to determine the content of web pages, such
as to maximize conversions on landing pages or click-through rates on search
engine result pages. Often the layout of these pages can be decoupled into
several separate decisions. For example, the composition of a landing page may
involve deciding which image to show, which wording to use, what color
background to display, etc. Such optimization is a combinatorial problem over
an exponentially large decision space. Randomized experiments do not scale well
to this setting, and therefore, in practice, one is typically limited to
optimizing a single aspect of a web page at a time. This represents a missed
opportunity in both the speed of experimentation and the exploitation of
possible interactions between layout decisions.
Here we focus on multivariate optimization of interactive web pages. We
formulate an approach where the possible interactions between different
components of the page are modeled explicitly. We apply bandit methodology to
explore the layout space efficiently and use hill-climbing to select optimal
content in realtime. Our algorithm also extends to contextualization and
personalization of layout selection. Simulation results show the suitability of
our approach to large decision spaces with strong interactions between content.
We further apply our algorithm to optimize a message that promotes adoption of
an Amazon service. After only a single week of online optimization, we saw a
21% conversion increase compared to the median layout. Our technique is
currently being deployed to optimize content across several locations at
Amazon.com.Comment: KDD'17 Audience Appreciation Awar
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Contextual bandit algorithms have become popular for online recommendation
systems such as Digg, Yahoo! Buzz, and news recommendation in general.
\emph{Offline} evaluation of the effectiveness of new algorithms in these
applications is critical for protecting online user experiences but very
challenging due to their "partial-label" nature. Common practice is to create a
simulator which simulates the online environment for the problem at hand and
then run an algorithm against this simulator. However, creating simulator
itself is often difficult and modeling bias is usually unavoidably introduced.
In this paper, we introduce a \emph{replay} methodology for contextual bandit
algorithm evaluation. Different from simulator-based approaches, our method is
completely data-driven and very easy to adapt to different applications. More
importantly, our method can provide provably unbiased evaluations. Our
empirical results on a large-scale news article recommendation dataset
collected from Yahoo! Front Page conform well with our theoretical results.
Furthermore, comparisons between our offline replay and online bucket
evaluation of several contextual bandit algorithms show accuracy and
effectiveness of our offline evaluation method.Comment: 10 pages, 7 figures, revised from the published version at the WSDM
2011 conferenc
Analysis of Different Types of Regret in Continuous Noisy Optimization
The performance measure of an algorithm is a crucial part of its analysis.
The performance can be determined by the study on the convergence rate of the
algorithm in question. It is necessary to study some (hopefully convergent)
sequence that will measure how "good" is the approximated optimum compared to
the real optimum. The concept of Regret is widely used in the bandit literature
for assessing the performance of an algorithm. The same concept is also used in
the framework of optimization algorithms, sometimes under other names or
without a specific name. And the numerical evaluation of convergence rate of
noisy algorithms often involves approximations of regrets. We discuss here two
types of approximations of Simple Regret used in practice for the evaluation of
algorithms for noisy optimization. We use specific algorithms of different
nature and the noisy sphere function to show the following results. The
approximation of Simple Regret, termed here Approximate Simple Regret, used in
some optimization testbeds, fails to estimate the Simple Regret convergence
rate. We also discuss a recent new approximation of Simple Regret, that we term
Robust Simple Regret, and show its advantages and disadvantages.Comment: Genetic and Evolutionary Computation Conference 2016, Jul 2016,
Denver, United States. 201
Learning what matters - Sampling interesting patterns
In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user's interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.Comment: PAKDD 2017, extended versio
- …