2,197 research outputs found
Topic Identification for Speech without ASR
Modern topic identification (topic ID) systems for speech use automatic
speech recognition (ASR) to produce speech transcripts, and perform supervised
classification on such ASR outputs. However, under resource-limited conditions,
the manually transcribed speech required to develop standard ASR systems can be
severely limited or unavailable. In this paper, we investigate alternative
unsupervised solutions to obtaining tokenizations of speech in terms of a
vocabulary of automatically discovered word-like or phoneme-like units, without
depending on the supervised training of ASR systems. Moreover, using automatic
phoneme-like tokenizations, we demonstrate that a convolutional neural network
based framework for learning spoken document representations provides
competitive performance compared to a standard bag-of-words representation, as
evidenced by comprehensive topic ID evaluations on both single-label and
multi-label classification tasks.Comment: 5 pages, 2 figures; accepted for publication at Interspeech 201
Cross Validation Of Neural Network Applications For Automatic New Topic Identification
There are recent studies in the literature on automatic topic-shift identification in Web search engine user sessions; however most of this work applied their topic-shift identification algorithms on data logs from a single search engine. The purpose of this study is to provide the cross-validation of an artificial neural network application to automatically identify topic changes in a web search engine user session by using data logs of different search engines for training and testing the neural network. Sample data logs from the Norwegian search engine FAST (currently owned by Overture) and Excite are used in this study. Findings of this study suggest that it could be possible to identify topic shifts and continuations successfully on a particular search engine user session using neural networks that are trained on a different search engine data log
Topic Identification System to Filter Twitter Feeds
Twitter is a micro-blogging service where users publish messages of 140 characters. This simple feature makes Twitter the source for concise, instant and interesting information ranging from friends' updates to breaking news. However, a problem emerge when a user follows many accounts while interested in a subset of its content, which leads to overwhelming tweets he is not interested in receiving. We propose a solution to this problem by filtering incoming tweets based on the user's interests, which is accomplished through a classifier. The proposed classifier system categorizes tweets into generic classes like Entertainment, Health, Sport, News, Food, Technology and Health. This paper describes the creation and evaluation of the classifier until 89% accuracy obtained
Topic identification challenge
Merit, Expertise and Measuremen
How to identify customary international law? – On the final outcome of the work of the international law commission (2018)
How to identify customary international law is an important question of international law. The International Law Commission has in 2018 adopted a set of sixteen conclusions, together with commentaries, on this topic. The paper consists of three parts: First, the reasons are discussed why the Commission came to work on the topic “Identification of customary international law”. Then, some of its conclusions are highlighted. Finally, the outcome of the work of the Commission is placed in a general context, before concluding
Improved topic identification for similar document search on mobile devices
This paper presents a novel, two level classifier ensemble designed to support document topic identification in mobile device environments. The proposed system aims at supporting mobile device users who search for documents located in other mobile devices which have similar topic to the documents on the users own device. Conforming to the environment of mobile devices, the algorithms are designed for slower processor, smaller memory capacity and they maintain small data traffic between the devices in order to keep low the cost of communication. We propose a keyword list based topic comparison, enhanced with a two level classifier ensemble to accelerate the topic identification process. The new technique enables document topic comparison using few communication traffic and it requires few calculations
- …