28,074 research outputs found
Effect of calculating Pointwise Mutual Information using a Fuzzy Sliding Window in Topic Modeling
Topic modeling is a popular method for analysing large amounts of unstructured text data and extracting meaningful insights. The coherence of the generated topics is a critical metric for determining the model quality and measuring the semantic relatedness of the words in a topic. The distributional hypothesis, a fundamental theory in linguistics, states that words occurring in the same contexts tend to have similar meanings. Based on this theory, word co-occurrence in a given context is often used to reflect word association in coherence scores. To this end, many coherence scores use Normalised Pointwise Mutual Information (NPMI), which uses a sliding window to describe the neighbourhood that defines the context. It is assumed that there is no other structure in the neighbourhood except for the presence of words. Inspired by the distributional hypothesis, we hypothesise the word distance to be relevant for determining the word association. Hence, we propose using a fuzzy sliding window to define a neighbourhood in which the association between words depends on the membership of the words in the fuzzy sliding window. To this end, we propose Fuzzy Normalized Pointwise Mutual Information (FNPMI) to calculate fuzzy coherence scores. We implement two different neighbourhood structures by the definition of the membership function of the sliding window.In the first implementation, the association between two words correlates positively with the distance, whereas the correlation is negative in the second. We compare the correlation of our proposed new coherence metrics with human judgment. We find that the use of a fuzzy sliding window correlates less with human judgment than a crisp sliding window. This finding indicates that word distance within a window is less important than defining the window size itself
Effect of Calculating Pointwise Mutual Information using a Fuzzy Sliding Window in Topic Modeling
Topic modeling is a popular method for analysing large amounts of unstructured text data and extracting meaningful insights. The coherence of the generated topics is a critical metric for determining the model quality and measuring the semantic relatedness of the words in a topic. The distributional hypothesis, a fundamental theory in linguistics, states that words occurring in the same contexts tend to have similar meanings. Based on this theory, word co-occurrence in a given context is often used to reflect word association in coherence scores. To this end, many coherence scores use Normalised Pointwise Mutual Information (NPMI), which uses a sliding window to describe the neighbourhood that defines the context. It is assumed that there is no other structure in the neighbourhood except for the presence of words. Inspired by the distributional hypothesis, we hypothesise the word distance to be relevant for determining the word association. Hence, we propose using a fuzzy sliding window to define a neighbourhood in which the association between words depends on the membership of the words in the fuzzy sliding window. To this end, we propose Fuzzy Normalized Pointwise Mutual Information (FNPMI) to calculate fuzzy coherence scores. We implement two different neighbourhood structures by the definition of the membership function of the sliding window. In the first implementation, the association between two words correlates positively with the distance, whereas the correlation is negative in the second. We compare the correlation of our proposed new coherence metrics with human judgment. We find that the use of a fuzzy sliding window correlates less with human judgment than a crisp sliding window. This finding indicates that word distance within a window is less important than defining the window size itself
The THUMOS Challenge on Action Recognition for Videos "in the Wild"
Automatically recognizing and localizing wide ranges of human actions has
crucial importance for video understanding. Towards this goal, the THUMOS
challenge was introduced in 2013 to serve as a benchmark for action
recognition. Until then, video action recognition, including THUMOS challenge,
had focused primarily on the classification of pre-segmented (i.e., trimmed)
videos, which is an artificial task. In THUMOS 2014, we elevated action
recognition to a more practical level by introducing temporally untrimmed
videos. These also include `background videos' which share similar scenes and
backgrounds as action videos, but are devoid of the specific actions. The three
editions of the challenge organized in 2013--2015 have made THUMOS a common
benchmark for action classification and detection and the annual challenge is
widely attended by teams from around the world.
In this paper we describe the THUMOS benchmark in detail and give an overview
of data collection and annotation procedures. We present the evaluation
protocols used to quantify results in the two THUMOS tasks of action
classification and temporal detection. We also present results of submissions
to the THUMOS 2015 challenge and review the participating approaches.
Additionally, we include a comprehensive empirical study evaluating the
differences in action recognition between trimmed and untrimmed videos, and how
well methods trained on trimmed videos generalize to untrimmed videos. We
conclude by proposing several directions and improvements for future THUMOS
challenges.Comment: Preprint submitted to Computer Vision and Image Understandin
A Dependency-Based Neural Network for Relation Classification
Previous research on relation classification has verified the effectiveness
of using dependency shortest paths or subtrees. In this paper, we further
explore how to make full use of the combination of these dependency
information. We first propose a new structure, termed augmented dependency path
(ADP), which is composed of the shortest dependency path between two entities
and the subtrees attached to the shortest path. To exploit the semantic
representation behind the ADP structure, we develop dependency-based neural
networks (DepNN): a recursive neural network designed to model the subtrees,
and a convolutional neural network to capture the most important features on
the shortest path. Experiments on the SemEval-2010 dataset show that our
proposed method achieves state-of-art results.Comment: This preprint is the full version of a short paper accepted in the
annual meeting of the Association for Computational Linguistics (ACL) 2015
(Beijing, China
Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts
Instead of mining coherent topics from a given text corpus in a completely
unsupervised manner, seed-guided topic discovery methods leverage user-provided
seed words to extract distinctive and coherent topics so that the mined topics
can better cater to the user's interest. To model the semantic correlation
between words and seeds for discovering topic-indicative terms, existing
seed-guided approaches utilize different types of context signals, such as
document-level word co-occurrences, sliding window-based local contexts, and
generic linguistic knowledge brought by pre-trained language models. In this
work, we analyze and show empirically that each type of context information has
its value and limitation in modeling word semantics under seed guidance, but
combining three types of contexts (i.e., word embeddings learned from local
contexts, pre-trained language model representations obtained from
general-domain training, and topic-indicative sentences retrieved based on seed
information) allows them to complement each other for discovering quality
topics. We propose an iterative framework, SeedTopicMine, which jointly learns
from the three types of contexts and gradually fuses their context signals via
an ensemble ranking process. Under various sets of seeds and on multiple
datasets, SeedTopicMine consistently yields more coherent and accurate topics
than existing seed-guided topic discovery approaches.Comment: 9 pages; Accepted to WSDM 202
- …