26,126 research outputs found
Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions
Opinion mining and demographic attribute inference have many applications in
social science. In this paper, we propose models to infer daily joint
probabilities of multiple latent attributes from Twitter data, such as
political sentiment and demographic attributes. Since it is costly and
time-consuming to annotate data for traditional supervised classification, we
instead propose scalable Learning from Label Proportions (LLP) models for
demographic and opinion inference using U.S. Census, national and state
political polls, and Cook partisan voting index as population level data. In
LLP classification settings, the training data is divided into a set of
unlabeled bags, where only the label distribution in of each bag is known,
removing the requirement of instance-level annotations. Our proposed LLP model,
Weighted Label Regularization (WLR), provides a scalable generalization of
prior work on label regularization to support weights for samples inside bags,
which is applicable in this setting where bags are arranged hierarchically
(e.g., county-level bags are nested inside of state-level bags). We apply our
model to Twitter data collected in the year leading up to the 2016 U.S.
presidential election, producing estimates of the relationships among political
sentiment and demographics over time and place. We find that our approach
closely tracks traditional polling data stratified by demographic category,
resulting in error reductions of 28-44% over baseline approaches. We also
provide descriptive evaluations showing how the model may be used to estimate
interactions among many variables and to identify linguistic temporal
variation, capabilities which are typically not feasible using traditional
polling methods
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Hard-aware Instance Adaptive Self-training for Unsupervised Cross-domain Semantic Segmentation
The divergence between labeled training data and unlabeled testing data is a
significant challenge for recent deep learning models. Unsupervised domain
adaptation (UDA) attempts to solve such problem. Recent works show that
self-training is a powerful approach to UDA. However, existing methods have
difficulty in balancing the scalability and performance. In this paper, we
propose a hard-aware instance adaptive self-training framework for UDA on the
task of semantic segmentation. To effectively improve the quality and diversity
of pseudo-labels, we develop a novel pseudo-label generation strategy with an
instance adaptive selector. We further enrich the hard class pseudo-labels with
inter-image information through a skillfully designed hard-aware pseudo-label
augmentation. Besides, we propose the region-adaptive regularization to smooth
the pseudo-label region and sharpen the non-pseudo-label region. For the
non-pseudo-label region, consistency constraint is also constructed to
introduce stronger supervision signals during model optimization. Our method is
so concise and efficient that it is easy to be generalized to other UDA
methods. Experiments on GTA5 to Cityscapes, SYNTHIA to Cityscapes, and
Cityscapes to Oxford RobotCar demonstrate the superior performance of our
approach compared with the state-of-the-art methods.Comment: arXiv admin note: text overlap with arXiv:2008.1219
CALDA: Improving Multi-Source Time Series Domain Adaptation with Contrastive Adversarial Learning
Unsupervised domain adaptation (UDA) provides a strategy for improving
machine learning performance in data-rich (target) domains where ground truth
labels are inaccessible but can be found in related (source) domains. In cases
where meta-domain information such as label distributions is available, weak
supervision can further boost performance. We propose a novel framework, CALDA,
to tackle these two problems. CALDA synergistically combines the principles of
contrastive learning and adversarial learning to robustly support multi-source
UDA (MS-UDA) for time series data. Similar to prior methods, CALDA utilizes
adversarial learning to align source and target feature representations. Unlike
prior approaches, CALDA additionally leverages cross-source label information
across domains. CALDA pulls examples with the same label close to each other,
while pushing apart examples with different labels, reshaping the space through
contrastive learning. Unlike prior contrastive adaptation methods, CALDA
requires neither data augmentation nor pseudo labeling, which may be more
challenging for time series. We empirically validate our proposed approach.
Based on results from human activity recognition, electromyography, and
synthetic datasets, we find utilizing cross-source information improves
performance over prior time series and contrastive methods. Weak supervision
further improves performance, even in the presence of noise, allowing CALDA to
offer generalizable strategies for MS-UDA. Code is available at:
https://github.com/floft/caldaComment: Under review at IEEE Transactions on Pattern Analysis and Machine
Intelligenc
- …