2,197 research outputs found

    Topic Identification for Speech without ASR

    Full text link
    Modern topic identification (topic ID) systems for speech use automatic speech recognition (ASR) to produce speech transcripts, and perform supervised classification on such ASR outputs. However, under resource-limited conditions, the manually transcribed speech required to develop standard ASR systems can be severely limited or unavailable. In this paper, we investigate alternative unsupervised solutions to obtaining tokenizations of speech in terms of a vocabulary of automatically discovered word-like or phoneme-like units, without depending on the supervised training of ASR systems. Moreover, using automatic phoneme-like tokenizations, we demonstrate that a convolutional neural network based framework for learning spoken document representations provides competitive performance compared to a standard bag-of-words representation, as evidenced by comprehensive topic ID evaluations on both single-label and multi-label classification tasks.Comment: 5 pages, 2 figures; accepted for publication at Interspeech 201

    Cross Validation Of Neural Network Applications For Automatic New Topic Identification

    Get PDF
    There are recent studies in the literature on automatic topic-shift identification in Web search engine user sessions; however most of this work applied their topic-shift identification algorithms on data logs from a single search engine. The purpose of this study is to provide the cross-validation of an artificial neural network application to automatically identify topic changes in a web search engine user session by using data logs of different search engines for training and testing the neural network. Sample data logs from the Norwegian search engine FAST (currently owned by Overture) and Excite are used in this study. Findings of this study suggest that it could be possible to identify topic shifts and continuations successfully on a particular search engine user session using neural networks that are trained on a different search engine data log

    Topic Identification System to Filter Twitter Feeds

    Get PDF
    Twitter is a micro-blogging service where users publish messages of 140 characters. This simple feature makes Twitter the source for concise, instant and interesting information ranging from friends' updates to breaking news. However, a problem emerge when a user follows many accounts while interested in a subset of its content, which leads to overwhelming tweets he is not interested in receiving. We propose a solution to this problem by filtering incoming tweets based on the user's interests, which is accomplished through a classifier. The proposed classifier system categorizes tweets into generic classes like Entertainment, Health, Sport, News, Food, Technology and Health. This paper describes the creation and evaluation of the classifier until 89% accuracy obtained

    Topic identification challenge

    Get PDF
    Merit, Expertise and Measuremen

    How to identify customary international law? – On the final outcome of the work of the international law commission (2018)

    Get PDF
    How to identify customary international law is an important question of international law. The International Law Commission has in 2018 adopted a set of sixteen conclusions, together with commentaries, on this topic. The paper consists of three parts: First, the reasons are discussed why the Commission came to work on the topic “Identification of customary international law”. Then, some of its conclusions are highlighted. Finally, the outcome of the work of the Commission is placed in a general context, before concluding

    Improved topic identification for similar document search on mobile devices

    Get PDF
    This paper presents a novel, two level classifier ensemble designed to support document topic identification in mobile device environments. The proposed system aims at supporting mobile device users who search for documents located in other mobile devices which have similar topic to the documents on the users own device. Conforming to the environment of mobile devices, the algorithms are designed for slower processor, smaller memory capacity and they maintain small data traffic between the devices in order to keep low the cost of communication. We propose a keyword list based topic comparison, enhanced with a two level classifier ensemble to accelerate the topic identification process. The new technique enables document topic comparison using few communication traffic and it requires few calculations
    • …
    corecore