61,723 research outputs found
XL-NBT: A Cross-lingual Neural Belief Tracking Framework
Task-oriented dialog systems are becoming pervasive, and many companies
heavily rely on them to complement human agents for customer service in call
centers. With globalization, the need for providing cross-lingual customer
support becomes more urgent than ever. However, cross-lingual support poses
great challenges---it requires a large amount of additional annotated data from
native speakers. In order to bypass the expensive human annotation and achieve
the first step towards the ultimate goal of building a universal dialog system,
we set out to build a cross-lingual state tracking framework. Specifically, we
assume that there exists a source language with dialog belief tracking
annotations while the target languages have no annotated dialog data of any
form. Then, we pre-train a state tracker for the source language as a teacher,
which is able to exploit easy-to-access parallel data. We then distill and
transfer its own knowledge to the student state tracker in target languages. We
specifically discuss two types of common parallel resources: bilingual corpus
and bilingual dictionary, and design different transfer learning strategies
accordingly. Experimentally, we successfully use English state tracker as the
teacher to transfer its knowledge to both Italian and German trackers and
achieve promising results.Comment: 13 pages, 5 figures, 3 tables, accepted to EMNLP 2018 conferenc
Learning Economic Parameters from Revealed Preferences
A recent line of work, starting with Beigman and Vohra (2006) and
Zadimoghaddam and Roth (2012), has addressed the problem of {\em learning} a
utility function from revealed preference data. The goal here is to make use of
past data describing the purchases of a utility maximizing agent when faced
with certain prices and budget constraints in order to produce a hypothesis
function that can accurately forecast the {\em future} behavior of the agent.
In this work we advance this line of work by providing sample complexity
guarantees and efficient algorithms for a number of important classes. By
drawing a connection to recent advances in multi-class learning, we provide a
computationally efficient algorithm with tight sample complexity guarantees
( for the case of goods) for learning linear utility
functions under a linear price model. This solves an open question in
Zadimoghaddam and Roth (2012). Our technique yields numerous generalizations
including the ability to learn other well-studied classes of utility functions,
to deal with a misspecified model, and with non-linear prices
Efficient Image Gallery Representations at Scale Through Multi-Task Learning
Image galleries provide a rich source of diverse information about a product
which can be leveraged across many recommendation and retrieval applications.
We study the problem of building a universal image gallery encoder through
multi-task learning (MTL) approach and demonstrate that it is indeed a
practical way to achieve generalizability of learned representations to new
downstream tasks. Additionally, we analyze the relative predictive performance
of MTL-trained solutions against optimal and substantially more expensive
solutions, and find signals that MTL can be a useful mechanism to address
sparsity in low-resource binary tasks.Comment: Proceedings of the 43rd International ACM SIGIR Conference on
Research and Development in Information Retrieva
Practical recommendations for gradient-based training of deep architectures
Learning algorithms related to artificial neural networks and in particular
for Deep Learning may seem to involve many bells and whistles, called
hyper-parameters. This chapter is meant as a practical guide with
recommendations for some of the most commonly used hyper-parameters, in
particular in the context of learning algorithms based on back-propagated
gradient and gradient-based optimization. It also discusses how to deal with
the fact that more interesting results can be obtained when allowing one to
adjust many hyper-parameters. Overall, it describes elements of the practice
used to successfully and efficiently train and debug large-scale and often deep
multi-layer neural networks. It closes with open questions about the training
difficulties observed with deeper architectures
A WOA-based optimization approach for task scheduling in cloud Computing systems
Task scheduling in cloud computing can directly
affect the resource usage and operational cost of a system. To
improve the efficiency of task executions in a cloud, various
metaheuristic algorithms, as well as their variations, have been
proposed to optimize the scheduling. In this work, for the
first time, we apply the latest metaheuristics WOA (the whale
optimization algorithm) for cloud task scheduling with a multiobjective optimization model, aiming at improving the performance of a cloud system with given computing resources. On that
basis, we propose an advanced approach called IWC (Improved
WOA for Cloud task scheduling) to further improve the optimal
solution search capability of the WOA-based method. We present
the detailed implementation of IWC and our simulation-based
experiments show that the proposed IWC has better convergence
speed and accuracy in searching for the optimal task scheduling
plans, compared to the current metaheuristic algorithms. Moreover, it can also achieve better performance on system resource
utilization, in the presence of both small and large-scale tasks
Attentive Single-Tasking of Multiple Tasks
In this work we address task interference in universal networks by
considering that a network is trained on multiple tasks, but performs one task
at a time, an approach we refer to as "single-tasking multiple tasks". The
network thus modifies its behaviour through task-dependent feature adaptation,
or task attention. This gives the network the ability to accentuate the
features that are adapted to a task, while shunning irrelevant ones. We further
reduce task interference by forcing the task gradients to be statistically
indistinguishable through adversarial training, ensuring that the common
backbone architecture serving all tasks is not dominated by any of the
task-specific gradients. Results in three multi-task dense labelling problems
consistently show: (i) a large reduction in the number of parameters while
preserving, or even improving performance and (ii) a smooth trade-off between
computation and multi-task accuracy. We provide our system's code and
pre-trained models at http://vision.ee.ethz.ch/~kmaninis/astmt/.Comment: CVPR 2019 Camera Read
- …