28,971 research outputs found
On the Application of Generic Summarization Algorithms to Music
Several generic summarization algorithms were developed in the past and
successfully applied in fields such as text and speech summarization. In this
paper, we review and apply these algorithms to music. To evaluate this
summarization's performance, we adopt an extrinsic approach: we compare a Fado
Genre Classifier's performance using truncated contiguous clips against the
summaries extracted with those algorithms on 2 different datasets. We show that
Maximal Marginal Relevance (MMR), LexRank and Latent Semantic Analysis (LSA)
all improve classification performance in both datasets used for testing.Comment: 12 pages, 1 table; Submitted to IEEE Signal Processing Letter
QoE Modelling, Measurement and Prediction: A Review
In mobile computing systems, users can access network services anywhere and
anytime using mobile devices such as tablets and smart phones. These devices
connect to the Internet via network or telecommunications operators. Users
usually have some expectations about the services provided to them by different
operators. Users' expectations along with additional factors such as cognitive
and behavioural states, cost, and network quality of service (QoS) may
determine their quality of experience (QoE). If users are not satisfied with
their QoE, they may switch to different providers or may stop using a
particular application or service. Thus, QoE measurement and prediction
techniques may benefit users in availing personalized services from service
providers. On the other hand, it can help service providers to achieve lower
user-operator switchover. This paper presents a review of the state-the-art
research in the area of QoE modelling, measurement and prediction. In
particular, we investigate and discuss the strengths and shortcomings of
existing techniques. Finally, we present future research directions for
developing novel QoE measurement and prediction technique
The TREC2001 video track: information retrieval on digital video information
The development of techniques to support content-based access to archives of digital video information has recently started to receive much attention from the research community. During 2001, the annual TREC activity, which has been benchmarking the performance of information retrieval techniques on a range of media for 10 years, included a ”track“ or activity which allowed investigation into approaches to support searching through a video library. This paper is not intended to provide a comprehensive picture of the different approaches taken by the TREC2001 video track participants but instead we give an overview of the TREC video search task and a thumbnail sketch of the approaches taken by different groups. The reason for writing this paper is to highlight the message from the TREC video track that there are now a variety of approaches available for searching and browsing through digital video archives, that these approaches do work, are scalable to larger archives and can yield useful retrieval performance for users. This has important implications in making digital libraries of video information attainable
Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification
There are a number of studies about extraction of bottleneck (BN) features
from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases
and triphone states for improving the performance of text-dependent speaker
verification (TD-SV). However, a moderate success has been achieved. A recent
study [1] presented a time contrastive learning (TCL) concept to explore the
non-stationarity of brain signals for classification of brain states. Speech
signals have similar non-stationarity property, and TCL further has the
advantage of having no need for labeled data. We therefore present a TCL based
BN feature extraction method. The method uniformly partitions each speech
utterance in a training dataset into a predefined number of multi-frame
segments. Each segment in an utterance corresponds to one class, and class
labels are shared across utterances. DNNs are then trained to discriminate all
speech frames among the classes to exploit the temporal structure of speech. In
addition, we propose a segment-based unsupervised clustering algorithm to
re-assign class labels to the segments. TD-SV experiments were conducted on the
RedDots challenge database. The TCL-DNNs were trained using speech data of
fixed pass-phrases that were excluded from the TD-SV evaluation set, so the
learned features can be considered phrase-independent. We compare the
performance of the proposed TCL bottleneck (BN) feature with those of
short-time cepstral features and BN features extracted from DNNs discriminating
speakers, pass-phrases, speaker+pass-phrase, as well as monophones whose labels
and boundaries are generated by three different automatic speech recognition
(ASR) systems. Experimental results show that the proposed TCL-BN outperforms
cepstral features and speaker+pass-phrase discriminant BN features, and its
performance is on par with those of ASR derived BN features. Moreover,....Comment: Copyright (c) 2019 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other work
- …