21 research outputs found
Clustering Human Mobility with Multiple Spaces
Human mobility clustering is an important problem for understanding human
mobility behaviors (e.g., work and school commutes). Existing methods typically
contain two steps: choosing or learning a mobility representation and applying
a clustering algorithm to the representation. However, these methods rely on
strict visiting orders in trajectories and cannot take advantage of multiple
types of mobility representations. This paper proposes a novel mobility
clustering method for mobility behavior detection. First, the proposed method
contains a permutation-equivalent operation to handle sub-trajectories that
might have different visiting orders but similar impacts on mobility behaviors.
Second, the proposed method utilizes a variational autoencoder architecture to
simultaneously perform clustering in both latent and original spaces. Also, in
order to handle the bias of a single latent space, our clustering assignment
prediction considers multiple learned latent spaces at different epochs. This
way, the proposed method produces accurate results and can provide reliability
estimates of each trajectory's cluster assignment. The experiment shows that
the proposed method outperformed state-of-the-art methods in mobility behavior
detection from trajectories with better accuracy and more interpretability
On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning
Though Self-supervised learning (SSL) has been widely studied as a promising
technique for representation learning, it doesn't generalize well on
long-tailed datasets due to the majority classes dominating the feature space.
Recent work shows that the long-tailed learning performance could be boosted by
sampling extra in-domain (ID) data for self-supervised training, however,
large-scale ID data which can rebalance the minority classes are expensive to
collect. In this paper, we propose an alternative but easy-to-use and effective
solution, Contrastive with Out-of-distribution (OOD) data for Long-Tail
learning (COLT), which can effectively exploit OOD data to dynamically
re-balance the feature space. We empirically identify the counter-intuitive
usefulness of OOD samples in SSL long-tailed learning and principally design a
novel SSL method. Concretely, we first localize the `head' and `tail' samples
by assigning a tailness score to each OOD sample based on its neighborhoods in
the feature space. Then, we propose an online OOD sampling strategy to
dynamically re-balance the feature space. Finally, we enforce the model to be
capable of distinguishing ID and OOD samples by a distribution-level supervised
contrastive loss. Extensive experiments are conducted on various datasets and
several state-of-the-art SSL frameworks to verify the effectiveness of the
proposed method. The results show that our method significantly improves the
performance of SSL on long-tailed datasets by a large margin, and even
outperforms previous work which uses external ID data. Our code is available at
https://github.com/JianhongBai/COLT