398 research outputs found

    Graph Signal Processing: Overview, Challenges and Applications

    Full text link
    Research in Graph Signal Processing (GSP) aims to develop tools for processing data defined on irregular graph domains. In this paper we first provide an overview of core ideas in GSP and their connection to conventional digital signal processing. We then summarize recent developments in developing basic GSP tools, including methods for sampling, filtering or graph learning. Next, we review progress in several application areas using GSP, including processing and analysis of sensor network data, biological data, and applications to image processing and machine learning. We finish by providing a brief historical perspective to highlight how concepts recently developed in GSP build on top of prior research in other areas.Comment: To appear, Proceedings of the IEE

    MobiBits: Multimodal Mobile Biometric Database

    Full text link
    This paper presents a novel database comprising representations of five different biometric characteristics, collected in a mobile, unconstrained or semi-constrained setting with three different mobile devices, including characteristics previously unavailable in existing datasets, namely hand images, thermal hand images, and thermal face images, all acquired with a mobile, off-the-shelf device. In addition to this collection of data we perform an extensive set of experiments providing insight on benchmark recognition performance that can be achieved with these data, carried out with existing commercial and academic biometric solutions. This is the first known to us mobile biometric database introducing samples of biometric traits such as thermal hand images and thermal face images. We hope that this contribution will make a valuable addition to the already existing databases and enable new experiments and studies in the field of mobile authentication. The MobiBits database is made publicly available to the research community at no cost for non-commercial purposes.Comment: Submitted for the BIOSIG2018 conference on June 18, 2018. Accepted for publication on July 20, 201

    Self-supervised Speaker Recognition with Loss-gated Learning

    Full text link
    In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals. It is a known fact that a speaker recognition model doesn't always benefit from pseudo labels due to their unreliability. In this work, we observe that a speaker recognition network tends to model the data with reliable labels faster than those with unreliable labels. This motivates us to study a loss-gated learning (LGL) strategy, which extracts the reliable labels through the fitting ability of the neural network during training. With the proposed LGL, our speaker recognition model obtains a 46.3%46.3\% performance gain over the system without it. Further, the proposed self-supervised speaker recognition with LGL trained on the VoxCeleb2 dataset without any labels achieves an equal error rate of 1.66%1.66\% on the VoxCeleb1 original test set. Code has been made available at: https://github.com/TaoRuijie/Loss-Gated-Learning.Comment: 5 pages, 3 figure

    Hierarchical Attention Network for Evaluating Therapist Empathy in Counseling Session

    Full text link
    Counseling typically takes the form of spoken conversation between a therapist and a client. The empathy level expressed by the therapist is considered to be an essential quality factor of counseling outcome. This paper proposes a hierarchical recurrent network combined with two-level attention mechanisms to determine the therapist's empathy level solely from the acoustic features of conversational speech in a counseling session. The experimental results show that the proposed model can achieve an accuracy of 72.1% in classifying the therapist's empathy level as being "high" or "low". It is found that the speech from both the therapist and the client are contributing to predicting the empathy level that is subjectively rated by an expert observer. By analyzing speaker turns assigned with high attention weights, it is observed that 2 to 6 consecutive turns should be considered together to provide useful clues for detecting empathy, and the observer tends to take the whole session into consideration when rating the therapist empathy, instead of relying on a few specific speaker turns.Comment: Submitted to INTERSPEECH 202

    Frame Interpolation for Cloud-Based Mobile Video Streaming

    Full text link
    © 2016 IEEE. Cloud-based High Definition (HD) video streaming is becoming popular day by day. On one hand, it is important for both end users and large storage servers to store their huge amount of data at different locations and servers. On the other hand, it is becoming a big challenge for network service providers to provide reliable connectivity to the network users. There have been many studies over cloud-based video streaming for Quality of Experience (QoE) for services like YouTube. Packet losses and bit errors are very common in transmission networks, which affect the user feedback over cloud-based media services. To cover up packet losses and bit errors, Error Concealment (EC) techniques are usually applied at the decoder/receiver side to estimate the lost information. This paper proposes a time-efficient and quality-oriented EC method. The proposed method considers H.265/HEVC based intra-encoded videos for the estimation of whole intra-frame loss. The main emphasis in the proposed approach is the recovery of Motion Vectors (MVs) of a lost frame in real-time. To boost-up the search process for the lost MVs, a bigger block size and searching in parallel are both considered. The simulation results clearly show that our proposed method outperforms the traditional Block Matching Algorithm (BMA) by approximately 2.5 dB and Frame Copy (FC) by up to 12 dB at a packet loss rate of 1%, 3%, and 5% with different Quantization Parameters (QPs). The computational time of the proposed approach outperforms the BMA by approximately 1788 seconds
    • 

    corecore