4,691 research outputs found
A Clustering Framework for Unsupervised and Semi-supervised New Intent Discovery
New intent discovery is of great value to natural language processing,
allowing for a better understanding of user needs and providing friendly
services. However, most existing methods struggle to capture the complicated
semantics of discrete text representations when limited or no prior knowledge
of labeled data is available. To tackle this problem, we propose a novel
clustering framework, USNID, for unsupervised and semi-supervised new intent
discovery, which has three key technologies. First, it fully utilizes
unsupervised or semi-supervised data to mine shallow semantic similarity
relations and provide well-initialized representations for clustering. Second,
it designs a centroid-guided clustering mechanism to address the issue of
cluster allocation inconsistency and provide high-quality self-supervised
targets for representation learning. Third, it captures high-level semantics in
unsupervised or semi-supervised data to discover fine-grained intent-wise
clusters by optimizing both cluster-level and instance-level objectives. We
also propose an effective method for estimating the cluster number in
open-world scenarios without knowing the number of new intents beforehand.
USNID performs exceptionally well on several benchmark intent datasets,
achieving new state-of-the-art results in unsupervised and semi-supervised new
intent discovery and demonstrating robust performance with different cluster
numbers.Comment: Accepted by IEEE TKD
Deep Feature Factorization For Concept Discovery
We propose Deep Feature Factorization (DFF), a method capable of localizing
similar semantic concepts within an image or a set of images. We use DFF to
gain insight into a deep convolutional neural network's learned features, where
we detect hierarchical cluster structures in feature space. This is visualized
as heat maps, which highlight semantically matching regions across a set of
images, revealing what the network `perceives' as similar. DFF can also be used
to perform co-segmentation and co-localization, and we report state-of-the-art
results on these tasks.Comment: The European Conference on Computer Vision (ECCV), 201
Automatic Synonym Discovery with Knowledge Bases
Recognizing entity synonyms from text has become a crucial task in many
entity-leveraging applications. However, discovering entity synonyms from
domain-specific text corpora (e.g., news articles, scientific papers) is rather
challenging. Current systems take an entity name string as input to find out
other names that are synonymous, ignoring the fact that often times a name
string can refer to multiple entities (e.g., "apple" could refer to both Apple
Inc and the fruit apple). Moreover, most existing methods require training data
manually created by domain experts to construct supervised-learning systems. In
this paper, we study the problem of automatic synonym discovery with knowledge
bases, that is, identifying synonyms for knowledge base entities in a given
domain-specific corpus. The manually-curated synonyms for each entity stored in
a knowledge base not only form a set of name strings to disambiguate the
meaning for each other, but also can serve as "distant" supervision to help
determine important features for the task. We propose a novel framework, called
DPE, to integrate two kinds of mutually-complementing signals for synonym
discovery, i.e., distributional features based on corpus-level statistics and
textual patterns based on local contexts. In particular, DPE jointly optimizes
the two kinds of signals in conjunction with distant supervision, so that they
can mutually enhance each other in the training stage. At the inference stage,
both signals will be utilized to discover synonyms for the given entities.
Experimental results prove the effectiveness of the proposed framework
- …