Search CORE

519 research outputs found

Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features

Author: Bell Peter
Klejch Ondrej
Renals Steve
Tsunoo Emiru
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/01/2018
Field of study

Integrating lexical and prosodic features for automatic paragraph segmentation

Author: Farrús Mireia
Lai Catherine
Moore Johanna
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Spoken documents, such as podcasts or lectures, are a growing presence in everyday life. Being able to automatically identify their discourse structure is an important step to understanding what a spoken document is about. Moreover, finer-grained units, such as paragraphs, are highly desirable for presenting and analyzing spoken content. However, little work has been done on discourse based speech segmentation below the level of broad topics. In order to examine how discourse transitions are cued in speech, we investigate automatic paragraph segmentation of TED talks using lexical and prosodic features. Experiments using Support Vector Machines, AdaBoost, and Neural Networks show that models using supra-sentential prosodic features and induced cue words perform better than those based on the type of lexical cohesion measures often used in broad topic segmentation. Moreover, combining a wide range of individually weak lexical and prosodic predictors improves performance, and modelling contextual information using recurrent neural networks outperforms other approaches by a large margin. Our best results come from using late fusion methods that integrate representations generated by separate lexical and prosodic models while allowing interactions between these features streams rather than treating them as independent information sources. Application to ASR outputs shows that adding prosodic features, particularly using late fusion, can significantly ameliorate decreases in performance due to transcription errors.The second author was funded from the EU’s Horizon 2020 Research and Innovation Programme under the GA H2020-RIA-645012 and the Spanish Ministry of Economy and Competitivity Juan de la Cierva program. The other authors were funded by the University of Edinburgh

Edinburgh Research Explorer

UPF Digital Repository

Diposit Digital de la Universitat de Barcelona

Hierarchical Recurrent Neural Network for Story Segmentation

Author: Bell Peter
Renals Steve
Tsunoo Emiru
Publication venue: 'International Speech Communication Association'
Publication date: 24/08/2017
Field of study

Crossref

Edinburgh Research Explorer

Detecting Topic-Oriented Speaker Stance in Conversational Speech

Author: Alex Beatrice
Francesca Gianpiero
Hori Tatsuro
Lai Catherine
Moore Johanna D.
Tian Leimin
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2019
Field of study

Crossref

Edinburgh Research Explorer

Monash University Research Portal

Improving automated segmentation of radio shows with audio embeddings

Author: Berlage Oberon
Graus David
Lux Klaus-Michael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/02/2020
Field of study

Audio features have been proven useful for increasing the performance of automated topic segmentation systems. This study explores the novel task of using audio embeddings for automated, topically coherent segmentation of radio shows. We created three different audio embedding generators using multi-class classification tasks on three datasets from different domains. We evaluate topic segmentation performance of the audio embeddings and compare it against a text-only baseline. We find that a set-up including audio embeddings generated through a non-speech sound event classification task significantly outperforms our text-only baseline by 32.3% in F1-measure. In addition, we find that different classification tasks yield audio embeddings that vary in segmentation performance.Comment: 5 pages, 2 figures, submitted to ICASSP202

arXiv.org e-Print Archive

Crossref

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Multimodal Content Analysis for Effective Advertisements on YouTube

Author: Gupta Harsh
Johnson Joseph
Lee Hyunhwan
Ogihara Mitsunori
Parthasarathy Srinivasan
Ren Gang
Sun Wei
Vedula Nikhita
Publication venue
Publication date: 12/09/2017
Field of study

The rapid advances in e-commerce and Web 2.0 technologies have greatly increased the impact of commercial advertisements on the general public. As a key enabling technology, a multitude of recommender systems exists which analyzes user features and browsing patterns to recommend appealing advertisements to users. In this work, we seek to study the characteristics or attributes that characterize an effective advertisement and recommend a useful set of features to aid the designing and production processes of commercial advertisements. We analyze the temporal patterns from multimedia content of advertisement videos including auditory, visual and textual components, and study their individual roles and synergies in the success of an advertisement. The objective of this work is then to measure the effectiveness of an advertisement, and to recommend a useful set of features to advertisement designers to make it more successful and approachable to users. Our proposed framework employs the signal processing technique of cross modality feature learning where data streams from different components are employed to train separate neural network models and are then fused together to learn a shared representation. Subsequently, a neural network model trained on this joint feature embedding representation is utilized as a classifier to predict advertisement effectiveness. We validate our approach using subjective ratings from a dedicated user study, the sentiment strength of online viewer comments, and a viewer opinion metric of the ratio of the Likes and Views received by each advertisement from an online platform.Comment: 11 pages, 5 figures, ICDM 201

arXiv.org e-Print Archive

Crossref

Multimodal Assessment of Cognitive Decline: Applications in Alzheimer’s Disease and Depression

Author: Rohanian M
Publication venue
Publication date: 09/01/2023
Field of study

The initial diagnosis and assessment of cognitive decline are generally based around the judgement of clinicians, and commonly used semi-structured interviews, guided by pre-determined sets of topics, in a clinical set-up. Publicly available multimodal datasets have provided an opportunity to explore a range of experiments in the automatic detecting of cognitive decline. Drawing on the latest developments in representation learning, machine learning, and natural language processing, we seek to develop models capable of identifying cognitive decline with an eye to discovering the differences and commonalities that should be considered in computational treatment of mental health disorders. We present models that learn the indicators of cognitive decline from audio and visual modalities as well as lexical, syntactic, disfluency and pause information. Our study is carried out in two parts: moderation analysis and predictive modelling. We do some experiments with different fusion techniques. Our approaches are motivated by some of the recent efforts in multimodal fusion for classifying cognitive states to capture the interaction between modalities and maximise the use and combination of each modality. We create tools for detecting cognitive decline and use them to analyze three major datasets containing speech produced by people with and without cognitive decline. These findings are being used to develop multimodal models for the detection of depression and Alzheimer’s dementia

Queen Mary Research Online