25,074 research outputs found
Wireless Data Acquisition for Edge Learning: Data-Importance Aware Retransmission
By deploying machine-learning algorithms at the network edge, edge learning
can leverage the enormous real-time data generated by billions of mobile
devices to train AI models, which enable intelligent mobile applications. In
this emerging research area, one key direction is to efficiently utilize radio
resources for wireless data acquisition to minimize the latency of executing a
learning task at an edge server. Along this direction, we consider the specific
problem of retransmission decision in each communication round to ensure both
reliability and quantity of those training data for accelerating model
convergence. To solve the problem, a new retransmission protocol called
data-importance aware automatic-repeat-request (importance ARQ) is proposed.
Unlike the classic ARQ focusing merely on reliability, importance ARQ
selectively retransmits a data sample based on its uncertainty which helps
learning and can be measured using the model under training. Underpinning the
proposed protocol is a derived elegant communication-learning relation between
two corresponding metrics, i.e., signal-to-noise ratio (SNR) and data
uncertainty. This relation facilitates the design of a simple threshold based
policy for importance ARQ. The policy is first derived based on the classic
classifier model of support vector machine (SVM), where the uncertainty of a
data sample is measured by its distance to the decision boundary. The policy is
then extended to the more complex model of convolutional neural networks (CNN)
where data uncertainty is measured by entropy. Extensive experiments have been
conducted for both the SVM and CNN using real datasets with balanced and
imbalanced distributions. Experimental results demonstrate that importance ARQ
effectively copes with channel fading and noise in wireless data acquisition to
achieve faster model convergence than the conventional channel-aware ARQ.Comment: This is an updated version: 1) extension to general classifiers; 2)
consideration of imbalanced classification in the experiments. Submitted to
IEEE Journal for possible publicatio
Personalized Dialogue Generation with Diversified Traits
Endowing a dialogue system with particular personality traits is essential to
deliver more human-like conversations. However, due to the challenge of
embodying personality via language expression and the lack of large-scale
persona-labeled dialogue data, this research problem is still far from
well-studied. In this paper, we investigate the problem of incorporating
explicit personality traits in dialogue generation to deliver personalized
dialogues.
To this end, firstly, we construct PersonalDialog, a large-scale multi-turn
dialogue dataset containing various traits from a large number of speakers. The
dataset consists of 20.83M sessions and 56.25M utterances from 8.47M speakers.
Each utterance is associated with a speaker who is marked with traits like Age,
Gender, Location, Interest Tags, etc. Several anonymization schemes are
designed to protect the privacy of each speaker. This large-scale dataset will
facilitate not only the study of personalized dialogue generation, but also
other researches on sociolinguistics or social science.
Secondly, to study how personality traits can be captured and addressed in
dialogue generation, we propose persona-aware dialogue generation models within
the sequence to sequence learning framework. Explicit personality traits
(structured by key-value pairs) are embedded using a trait fusion module.
During the decoding process, two techniques, namely persona-aware attention and
persona-aware bias, are devised to capture and address trait-related
information. Experiments demonstrate that our model is able to address proper
traits in different contexts. Case studies also show interesting results for
this challenging research problem.Comment: Please contact [zhengyinhe1 at 163 dot com] for the PersonalDialog
datase
{iFair}: {L}earning Individually Fair Data Representations for Algorithmic Decision Making
People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: ensuring that each ethnic or social group receives its fair share in the outcome of classifiers and rankings. In contrast, the alternative paradigm of individual fairness has received relatively little attention. This paper introduces a method for probabilistically clustering user records into a low-rank representation that captures individual fairness yet also achieves high accuracy in classification and regression models. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. Since the case for fairness is ubiquitous across many tasks, we aim to learn general representations that can be applied to arbitrary downstream use-cases. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on two real-world datasets. Our experiments show substantial improvements over the best prior work for this setting
- …