3,893 research outputs found

    Dual Associated Encoder for Face Restoration

    Full text link
    Restoring facial details from low-quality (LQ) images has remained a challenging problem due to its ill-posedness induced by various degradations in the wild. The existing codebook prior mitigates the ill-posedness by leveraging an autoencoder and learned codebook of high-quality (HQ) features, achieving remarkable quality. However, existing approaches in this paradigm frequently depend on a single encoder pre-trained on HQ data for restoring HQ images, disregarding the domain gap between LQ and HQ images. As a result, the encoding of LQ inputs may be insufficient, resulting in suboptimal performance. To tackle this problem, we propose a novel dual-branch framework named DAEFR. Our method introduces an auxiliary LQ branch that extracts crucial information from the LQ inputs. Additionally, we incorporate association training to promote effective synergy between the two branches, enhancing code prediction and output quality. We evaluate the effectiveness of DAEFR on both synthetic and real-world datasets, demonstrating its superior performance in restoring facial details.Comment: Technical Repor

    Convolution channel separation and frequency sub-bands aggregation for music genre classification

    Full text link
    In music, short-term features such as pitch and tempo constitute long-term semantic features such as melody and narrative. A music genre classification (MGC) system should be able to analyze these features. In this research, we propose a novel framework that can extract and aggregate both short- and long-term features hierarchically. Our framework is based on ECAPA-TDNN, where all the layers that extract short-term features are affected by the layers that extract long-term features because of the back-propagation training. To prevent the distortion of short-term features, we devised the convolution channel separation technique that separates short-term features from long-term feature extraction paths. To extract more diverse features from our framework, we incorporated the frequency sub-bands aggregation method, which divides the input spectrogram along frequency bandwidths and processes each segment. We evaluated our framework using the Melon Playlist dataset which is a large-scale dataset containing 600 times more data than GTZAN which is a widely used dataset in MGC studies. As the result, our framework achieved 70.4% accuracy, which was improved by 16.9% compared to a conventional framework

    Integrated Parameter-Efficient Tuning for General-Purpose Audio Models

    Full text link
    The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of building task-specific models for target tasks. In the field of audio research, task-agnostic pre-trained models with high transferability and adaptability have achieved state-of-the-art performances through fine-tuning for downstream tasks. Nevertheless, re-training all the parameters of these massive models entails an enormous amount of time and cost, along with a huge carbon footprint. To overcome these limitations, the present study explores and applies efficient transfer learning methods in the audio domain. We also propose an integrated parameter-efficient tuning (IPET) framework by aggregating the embedding prompt (a prompt-based learning approach), and the adapter (an effective transfer learning method). We demonstrate the efficacy of the proposed framework using two backbone pre-trained audio models with different characteristics: the audio spectrogram transformer and wav2vec 2.0. The proposed IPET framework exhibits remarkable performance compared to fine-tuning method with fewer trainable parameters in four downstream tasks: sound event classification, music genre classification, keyword spotting, and speaker verification. Furthermore, the authors identify and analyze the shortcomings of the IPET framework, providing lessons and research directions for parameter efficient tuning in the audio domain.Comment: 5 pages, 3 figures, submit to ICASSP202

    One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

    Full text link
    The application of speech self-supervised learning (SSL) models has achieved remarkable performance in speaker verification (SV). However, there is a computational cost hurdle in employing them, which makes development and deployment difficult. Several studies have simply compressed SSL models through knowledge distillation (KD) without considering the target task. Consequently, these methods could not extract SV-tailored features. This paper suggests One-Step Knowledge Distillation and Fine-Tuning (OS-KDFT), which incorporates KD and fine-tuning (FT). We optimize a student model for SV during KD training to avert the distillation of inappropriate information for the SV. OS-KDFT could downsize Wav2Vec 2.0 based ECAPA-TDNN size by approximately 76.2%, and reduce the SSL model's inference time by 79% while presenting an EER of 0.98%. The proposed OS-KDFT is validated across VoxCeleb1 and VoxCeleb2 datasets and W2V2 and HuBERT SSL models. Experiments are available on our GitHub

    PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification

    Full text link
    Background noise reduces speech intelligibility and quality, making speaker verification (SV) in noisy environments a challenging task. To improve the noise robustness of SV systems, additive noise data augmentation method has been commonly used. In this paper, we propose a new additive noise method, partial additive speech (PAS), which aims to train SV systems to be less affected by noisy environments. The experimental results demonstrate that PAS outperforms traditional additive noise in terms of equal error rates (EER), with relative improvements of 4.64% and 5.01% observed in SE-ResNet34 and ECAPA-TDNN. We also show the effectiveness of proposed method by analyzing attention modules and visualizing speaker embeddings.Comment: 5 pages, 2 figures, 1 table, accepted to CKAIA2023 as a conference pape

    Deep Learning-based Fall Detection Algorithm Using Ensemble Model of Coarse-fine CNN and GRU Networks

    Full text link
    Falls are the public health issue for the elderly all over the world since the fall-induced injuries are associated with a large amount of healthcare cost. Falls can cause serious injuries, even leading to death if the elderly suffers a "long-lie". Hence, a reliable fall detection (FD) system is required to provide an emergency alarm for first aid. Due to the advances in wearable device technology and artificial intelligence, some fall detection systems have been developed using machine learning and deep learning methods to analyze the signal collected from accelerometer and gyroscopes. In order to achieve better fall detection performance, an ensemble model that combines a coarse-fine convolutional neural network and gated recurrent unit is proposed in this study. The parallel structure design used in this model restores the different grains of spatial characteristics and capture temporal dependencies for feature representation. This study applies the FallAllD public dataset to validate the reliability of the proposed model, which achieves a recall, precision, and F-score of 92.54%, 96.13%, and 94.26%, respectively. The results demonstrate the reliability of the proposed ensemble model in discriminating falls from daily living activities and its superior performance compared to the state-of-the-art convolutional neural network long short-term memory (CNN-LSTM) for FD

    Intrusion detection routers: Design, implementation and evaluation using an experimental testbed

    Get PDF
    In this paper, we present the design, the implementation details, and the evaluation results of an intrusion detection and defense system for distributed denial-of-service (DDoS) attack. The evaluation is conducted using an experimental testbed. The system, known as intrusion detection router (IDR), is deployed on network routers to perform online detection on any DDoS attack event, and then react with defense mechanisms to mitigate the attack. The testbed is built up by a cluster of sufficient number of Linux machines to mimic a portion of the Internet. Using the testbed, we conduct real experiments to evaluate the IDR system and demonstrate that IDR is effective in protecting the network from various DDoS attacks. © 2006 IEEE.published_or_final_versio

    A case report of chronic granulomatous disease presenting with aspergillus pneumonia in a 2-month old girl

    Get PDF
    Chronic granulomatous disease (CGD) is an uncommon inherited disorder caused by mutations in any of the genes encoding subunits of the superoxide-generating phagocyte NADPH oxidase system, which is essential for killing catalase producing bacteria and fungi, such as Aspergillus species, Staphylococcus aureus, Serratia marcescens, Nocardia species and Burkholderia cepacia. In case of a history of recurrent or persistent infections, immune deficiency should be investigated. Particularly, in the case of uncommon infections such as aspergillosis in early life, CGD should be considered. We describe here a case of CGD that presented with invasive pulmonary aspergillosis in a 2-month-old girl. We confirmed pulmonary aspergillosis noninvasively through a positive result from the culture of bronchial alveolar lavage fluid, positive serological test for Aspergillus antigen and radiology results. She was successfully treated with Amphotericin B and recombinant IFN-γ initially. Six weeks later after discharge, she was readmitted for pneumonia. Since there were infiltrates on the right lower lung, which were considered as residual lesions, voriconazole therapy was initiated. She showed a favorable response to the treatment and follow-up CT showed regression of the pulmonary infiltrates
    • …
    corecore