1,758 research outputs found
Self-Paced Multi-Task Learning
In this paper, we propose a novel multi-task learning (MTL) framework, called
Self-Paced Multi-Task Learning (SPMTL). Different from previous works treating
all tasks and instances equally when training, SPMTL attempts to jointly learn
the tasks by taking into consideration the complexities of both tasks and
instances. This is inspired by the cognitive process of human brain that often
learns from the easy to the hard. We construct a compact SPMTL formulation by
proposing a new task-oriented regularizer that can jointly prioritize the tasks
and the instances. Thus it can be interpreted as a self-paced learner for MTL.
A simple yet effective algorithm is designed for optimizing the proposed
objective function. An error bound for a simplified formulation is also
analyzed theoretically. Experimental results on toy and real-world datasets
demonstrate the effectiveness of the proposed approach, compared to the
state-of-the-art methods
A Hierarchical Bayesian Trust Model based on Reputation and Group Behaviour
In many systems, agents must rely on their peers to achieve their goals. However, when trusted to perform an action, an agent may betray that trust by not behaving as required. Agents must therefore estimate the behaviour of their peers, so that they may identify reliable interaction partners. To this end, we present a Bayesian trust model (HABIT) for assessing trust based on direct experience and (potentially unreliable) reputation. Although existing approaches claim to achieve this, most rely on heuristics with little theoretical foundation. In contrast, HABIT is based on principled statistical techniques; can be used with any representation of behaviour; and can assess trust based on observed similarities between groups of agents. In this paper, we describe the theoretical aspects of the model and present experimental results in which HABIT was shown to be up to twice as accurate at predicting trustee performance as an existing state-of-the-art trust model
The Sample Complexity of Auctions with Side Information
Traditionally, the Bayesian optimal auction design problem has been
considered either when the bidder values are i.i.d, or when each bidder is
individually identifiable via her value distribution. The latter is a
reasonable approach when the bidders can be classified into a few categories,
but there are many instances where the classification of bidders is a
continuum. For example, the classification of the bidders may be based on their
annual income, their propensity to buy an item based on past behavior, or in
the case of ad auctions, the click through rate of their ads. We introduce an
alternate model that captures this aspect, where bidders are a priori
identical, but can be distinguished based (only) on some side information the
auctioneer obtains at the time of the auction. We extend the sample complexity
approach of Dhangwatnotai et al. and Cole and Roughgarden to this model and
obtain almost matching upper and lower bounds. As an aside, we obtain a revenue
monotonicity lemma which may be of independent interest. We also show how to
use Empirical Risk Minimization techniques to improve the sample complexity
bound of Cole and Roughgarden for the non-identical but independent value
distribution case.Comment: A version of this paper appeared in STOC 201
DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret
Dynamic treatment regimes (DTRs) are personalized, adaptive, multi-stage
treatment plans that adapt treatment decisions both to an individual's initial
features and to intermediate outcomes and features at each subsequent stage,
which are affected by decisions in prior stages. Examples include personalized
first- and second-line treatments of chronic conditions like diabetes, cancer,
and depression, which adapt to patient response to first-line treatment,
disease progression, and individual characteristics. While existing literature
mostly focuses on estimating the optimal DTR from offline data such as from
sequentially randomized trials, we study the problem of developing the optimal
DTR in an online manner, where the interaction with each individual affect both
our cumulative reward and our data collection for future learning. We term this
the DTR bandit problem. We propose a novel algorithm that, by carefully
balancing exploration and exploitation, is guaranteed to achieve rate-optimal
regret when the transition and reward models are linear. We demonstrate our
algorithm and its benefits both in synthetic experiments and in a case study of
adaptive treatment of major depressive disorder using real-world data
- ā¦