Search CORE

398 research outputs found

The extended marine underwater environment database and baseline evaluations

Author: Cui Chaoran
Dong Junyu
Jian Muwei
Lam Kin-Man
Nie Xiushan
Qi Qiang
Yin Yilong
Yu Hui
Zhang Huaxiang
Publication venue: 'Elsevier BV'
Publication date: 01/07/2019
Field of study

Portsmouth University Research Portal (Pure)

Graph Signal Processing: Overview, Challenges and Applications

Author: Frossard Pascal
Kovačević Jelena
Moura José M. F.
Ortega Antonio
Vandergheynst Pierre
Publication venue
Publication date: 26/03/2018
Field of study

Research in Graph Signal Processing (GSP) aims to develop tools for processing data defined on irregular graph domains. In this paper we first provide an overview of core ideas in GSP and their connection to conventional digital signal processing. We then summarize recent developments in developing basic GSP tools, including methods for sampling, filtering or graph learning. Next, we review progress in several application areas using GSP, including processing and analysis of sensor network data, biological data, and applications to image processing and machine learning. We finish by providing a brief historical perspective to highlight how concepts recently developed in GSP build on top of prior research in other areas.Comment: To appear, Proceedings of the IEE

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

MobiBits: Multimodal Mobile Biometric Database

Author: Bartuzi Ewelina
Białobrzeski Radosław
Roszczewska Katarzyna
Trokielewicz Mateusz
Publication venue
Publication date: 31/08/2018
Field of study

This paper presents a novel database comprising representations of five different biometric characteristics, collected in a mobile, unconstrained or semi-constrained setting with three different mobile devices, including characteristics previously unavailable in existing datasets, namely hand images, thermal hand images, and thermal face images, all acquired with a mobile, off-the-shelf device. In addition to this collection of data we perform an extensive set of experiments providing insight on benchmark recognition performance that can be achieved with these data, carried out with existing commercial and academic biometric solutions. This is the first known to us mobile biometric database introducing samples of biometric traits such as thermal hand images and thermal face images. We hope that this contribution will make a valuable addition to the already existing databases and enable new experiments and studies in the field of mobile authentication. The MobiBits database is made publicly available to the research community at no cost for non-commercial purposes.Comment: Submitted for the BIOSIG2018 conference on June 18, 2018. Accepted for publication on July 20, 201

arXiv.org e-Print Archive

Crossref

Self-supervised Speaker Recognition with Loss-gated Learning

Author: Das Rohan Kumar
Hautamäki Ville
Lee Kong Aik
Li Haizhou
Tao Ruijie
Publication venue
Publication date: 14/07/2022
Field of study

In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals. It is a known fact that a speaker recognition model doesn't always benefit from pseudo labels due to their unreliability. In this work, we observe that a speaker recognition network tends to model the data with reliable labels faster than those with unreliable labels. This motivates us to study a loss-gated learning (LGL) strategy, which extracts the reliable labels through the fitting ability of the neural network during training. With the proposed LGL, our speaker recognition model obtains a

46.3\%

performance gain over the system without it. Further, the proposed self-supervised speaker recognition with LGL trained on the VoxCeleb2 dataset without any labels achieves an equal error rate of

1.66\%

on the VoxCeleb1 original test set. Code has been made available at: https://github.com/TaoRuijie/Loss-Gated-Learning.Comment: 5 pages, 3 figure

arXiv.org e-Print Archive

Hierarchical Attention Network for Evaluating Therapist Empathy in Counseling Session

Author: Chui Harold
Lee Tan
Luk Sarah
Tao Dehua
Publication venue
Publication date: 31/03/2022
Field of study

Counseling typically takes the form of spoken conversation between a therapist and a client. The empathy level expressed by the therapist is considered to be an essential quality factor of counseling outcome. This paper proposes a hierarchical recurrent network combined with two-level attention mechanisms to determine the therapist's empathy level solely from the acoustic features of conversational speech in a counseling session. The experimental results show that the proposed model can achieve an accuracy of 72.1% in classifying the therapist's empathy level as being "high" or "low". It is found that the speech from both the therapist and the client are contributing to predicting the empathy level that is subjectively rated by an expert observer. By analyzing speaker turns assigned with high attention weights, it is observed that 2 to 6 consecutive turns should be considered together to provide useful clues for detecting empathy, and the observer tends to take the whole session into consideration when rating the therapist empathy, instead of relying on a few specific speaker turns.Comment: Submitted to INTERSPEECH 202

arXiv.org e-Print Archive

Frame Interpolation for Cloud-Based Mobile Video Streaming

Author: Bokhari SMM
Chen J
He X
Lam KM
Usman M
Xu M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2016
Field of study

© 2016 IEEE. Cloud-based High Definition (HD) video streaming is becoming popular day by day. On one hand, it is important for both end users and large storage servers to store their huge amount of data at different locations and servers. On the other hand, it is becoming a big challenge for network service providers to provide reliable connectivity to the network users. There have been many studies over cloud-based video streaming for Quality of Experience (QoE) for services like YouTube. Packet losses and bit errors are very common in transmission networks, which affect the user feedback over cloud-based media services. To cover up packet losses and bit errors, Error Concealment (EC) techniques are usually applied at the decoder/receiver side to estimate the lost information. This paper proposes a time-efficient and quality-oriented EC method. The proposed method considers H.265/HEVC based intra-encoded videos for the estimation of whole intra-frame loss. The main emphasis in the proposed approach is the recovery of Motion Vectors (MVs) of a lost frame in real-time. To boost-up the search process for the lost MVs, a bigger block size and searching in parallel are both considered. The simulation results clearly show that our proposed method outperforms the traditional Block Matching Algorithm (BMA) by approximately 2.5 dB and Frame Copy (FC) by up to 12 dB at a packet loss rate of 1%, 3%, and 5% with different Quantization Parameters (QPs). The computational time of the proposed approach outperforms the BMA by approximately 1788 seconds

The Hong Kong Polytechnic University Pao Yue-kong Library

OPUS - University of Technology Sydney

PolyU Institutional Repository

Spoofing and Anti-Spoofing: A Shared View of Speaker Verification, Speech Synthesis and Voice Conversion

Author: Evans Nicholas
Kinnunen Tomi
Wu Zhizheng
Yamagishi Junichi
Publication venue
Publication date: 01/01/2015
Field of study

Edinburgh Research Explorer