Search CORE

28,074 research outputs found

Effect of calculating Pointwise Mutual Information using a Fuzzy Sliding Window in Topic Modeling

Author: Kaymak Uzay
Rijcken Emil
Scheepers Floortje
Spruit Marco
Zervanou Kalliopi
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 09/11/2023
Field of study

Topic modeling is a popular method for analysing large amounts of unstructured text data and extracting meaningful insights. The coherence of the generated topics is a critical metric for determining the model quality and measuring the semantic relatedness of the words in a topic. The distributional hypothesis, a fundamental theory in linguistics, states that words occurring in the same contexts tend to have similar meanings. Based on this theory, word co-occurrence in a given context is often used to reflect word association in coherence scores. To this end, many coherence scores use Normalised Pointwise Mutual Information (NPMI), which uses a sliding window to describe the neighbourhood that defines the context. It is assumed that there is no other structure in the neighbourhood except for the presence of words. Inspired by the distributional hypothesis, we hypothesise the word distance to be relevant for determining the word association. Hence, we propose using a fuzzy sliding window to define a neighbourhood in which the association between words depends on the membership of the words in the fuzzy sliding window. To this end, we propose Fuzzy Normalized Pointwise Mutual Information (FNPMI) to calculate fuzzy coherence scores. We implement two different neighbourhood structures by the definition of the membership function of the sliding window.In the first implementation, the association between two words correlates positively with the distance, whereas the correlation is negative in the second. We compare the correlation of our proposed new coherence metrics with human judgment. We find that the use of a fuzzy sliding window correlates less with human judgment than a crisp sliding window. This finding indicates that word distance within a window is less important than defining the window size itself

Pure OAI Repository

Effect of Calculating Pointwise Mutual Information using a Fuzzy Sliding Window in Topic Modeling

Author: Kaymak Uzay
Rijcken Emil
Scheepers Floortje
Spruit Marco
Zervanou Kalliopi
Publication venue
Publication date: 01/01/2023
Field of study

Topic modeling is a popular method for analysing large amounts of unstructured text data and extracting meaningful insights. The coherence of the generated topics is a critical metric for determining the model quality and measuring the semantic relatedness of the words in a topic. The distributional hypothesis, a fundamental theory in linguistics, states that words occurring in the same contexts tend to have similar meanings. Based on this theory, word co-occurrence in a given context is often used to reflect word association in coherence scores. To this end, many coherence scores use Normalised Pointwise Mutual Information (NPMI), which uses a sliding window to describe the neighbourhood that defines the context. It is assumed that there is no other structure in the neighbourhood except for the presence of words. Inspired by the distributional hypothesis, we hypothesise the word distance to be relevant for determining the word association. Hence, we propose using a fuzzy sliding window to define a neighbourhood in which the association between words depends on the membership of the words in the fuzzy sliding window. To this end, we propose Fuzzy Normalized Pointwise Mutual Information (FNPMI) to calculate fuzzy coherence scores. We implement two different neighbourhood structures by the definition of the membership function of the sliding window. In the first implementation, the association between two words correlates positively with the distance, whereas the correlation is negative in the second. We compare the correlation of our proposed new coherence metrics with human judgment. We find that the use of a fuzzy sliding window correlates less with human judgment than a crisp sliding window. This finding indicates that word distance within a window is less important than defining the window size itself

Utrecht University Repository

The THUMOS Challenge on Action Recognition for Videos "in the Wild"

Author: Gorban Alex
Idrees Haroon
Jiang Yu-Gang
Laptev Ivan
Shah Mubarak
Sukthankar Rahul
Zamir Amir R.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Automatically recognizing and localizing wide ranges of human actions has crucial importance for video understanding. Towards this goal, the THUMOS challenge was introduced in 2013 to serve as a benchmark for action recognition. Until then, video action recognition, including THUMOS challenge, had focused primarily on the classification of pre-segmented (i.e., trimmed) videos, which is an artificial task. In THUMOS 2014, we elevated action recognition to a more practical level by introducing temporally untrimmed videos. These also include `background videos' which share similar scenes and backgrounds as action videos, but are devoid of the specific actions. The three editions of the challenge organized in 2013--2015 have made THUMOS a common benchmark for action classification and detection and the annual challenge is widely attended by teams from around the world. In this paper we describe the THUMOS benchmark in detail and give an overview of data collection and annotation procedures. We present the evaluation protocols used to quantify results in the two THUMOS tasks of action classification and temporal detection. We also present results of submissions to the THUMOS 2015 challenge and review the participating approaches. Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos. We conclude by proposing several directions and improvements for future THUMOS challenges.Comment: Preprint submitted to Computer Vision and Image Understandin

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

A Dependency-Based Neural Network for Relation Classification

Author: Ji Heng
Li Sujian
Liu Yang
Wang Houfeng
Wei Furu
Zhou Ming
Publication venue
Publication date: 16/07/2015
Field of study

Previous research on relation classification has verified the effectiveness of using dependency shortest paths or subtrees. In this paper, we further explore how to make full use of the combination of these dependency information. We first propose a new structure, termed augmented dependency path (ADP), which is composed of the shortest dependency path between two entities and the subtrees attached to the shortest path. To exploit the semantic representation behind the ADP structure, we develop dependency-based neural networks (DepNN): a recursive neural network designed to model the subtrees, and a convolutional neural network to capture the most important features on the shortest path. Experiments on the SemEval-2010 dataset show that our proposed method achieves state-of-art results.Comment: This preprint is the full version of a short paper accepted in the annual meeting of the Association for Computational Linguistics (ACL) 2015 (Beijing, China

arXiv.org e-Print Archive

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

Author: Han Jiawei
Jiang Yucheng
Meng Yu
Michalski Martin
Zhang Yu
Zhang Yunyi
Publication venue
Publication date: 12/12/2022
Field of study

Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest. To model the semantic correlation between words and seeds for discovering topic-indicative terms, existing seed-guided approaches utilize different types of context signals, such as document-level word co-occurrences, sliding window-based local contexts, and generic linguistic knowledge brought by pre-trained language models. In this work, we analyze and show empirically that each type of context information has its value and limitation in modeling word semantics under seed guidance, but combining three types of contexts (i.e., word embeddings learned from local contexts, pre-trained language model representations obtained from general-domain training, and topic-indicative sentences retrieved based on seed information) allows them to complement each other for discovering quality topics. We propose an iterative framework, SeedTopicMine, which jointly learns from the three types of contexts and gradually fuses their context signals via an ensemble ranking process. Under various sets of seeds and on multiple datasets, SeedTopicMine consistently yields more coherent and accurate topics than existing seed-guided topic discovery approaches.Comment: 9 pages; Accepted to WSDM 202

arXiv.org e-Print Archive