2,323 research outputs found

    Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification

    Full text link
    In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently fine-tune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-level layers to be sufficiently trained by rolling back the weights of high-level layers to their initial pre-trained weights. Our strategy alleviates the problem of gradient vanishing in low-level layers and robustly trains the low-level layers to fit the ReID dataset, thereby increasing the performance of ReID tasks. The improved performance of the proposed strategy is validated via several experiments. Furthermore, without any add-ons such as pose estimation or segmentation, our strategy exhibits state-of-the-art performance using only vanilla deep convolutional neural network architecture.Comment: Accepted to AAAI 201

    An Empirical Study Of Hospitality Management Student Attitudes Toward Group Projects: Instructional Factors And Team Problems

    Get PDF
    The development of positive attitudes in team-based work is important in management education. This study investigates hospitality studentsā€™ attitudes toward group projects by examining instructional factors and team problems. Specifically, we examine how the studentsā€™ perceptions of project appropriateness, instructorsā€™ support, and evaluation fairness influence their attitudes toward group projects. Also the effect of studentsā€™ team problems on their attitudes toward group projects is examined. This study has highlighted the criticality of the instructorā€™s role in group project management for achieving a high level of positive attitudes toward group projects among the hospitality management students

    AutoLR: Layer-wise Pruning and Auto-tuning of Learning Rates in Fine-tuning of Deep Networks

    Full text link
    Existing fine-tuning methods use a single learning rate over all layers. In this paper, first, we discuss that trends of layer-wise weight variations by fine-tuning using a single learning rate do not match the well-known notion that lower-level layers extract general features and higher-level layers extract specific features. Based on our discussion, we propose an algorithm that improves fine-tuning performance and reduces network complexity through layer-wise pruning and auto-tuning of layer-wise learning rates. The proposed algorithm has verified the effectiveness by achieving state-of-the-art performance on the image retrieval benchmark datasets (CUB-200, Cars-196, Stanford online product, and Inshop). Code is available at https://github.com/youngminPIL/AutoLR.Comment: Accepted to AAAI 202

    DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

    Full text link
    Recent research has demonstrated impressive results in video-to-speech synthesis which involves reconstructing speech solely from visual input. However, previous works have struggled to accurately synthesize speech due to a lack of sufficient guidance for the model to infer the correct content with the appropriate sound. To resolve the issue, they have adopted an extra speaker embedding as a speaking style guidance from a reference auditory information. Nevertheless, it is not always possible to obtain the audio information from the corresponding video input, especially during the inference time. In this paper, we present a novel vision-guided speaker embedding extractor using a self-supervised pre-trained model and prompt tuning technique. In doing so, the rich speaker embedding information can be produced solely from input visual information, and the extra audio information is not necessary during the inference time. Using the extracted vision-guided speaker embedding representations, we further develop a diffusion-based video-to-speech synthesis model, so called DiffV2S, conditioned on those speaker embeddings and the visual representation extracted from the input video. The proposed DiffV2S not only maintains phoneme details contained in the input video frames, but also creates a highly intelligible mel-spectrogram in which the speaker identities of the multiple speakers are all preserved. Our experimental results show that DiffV2S achieves the state-of-the-art performance compared to the previous video-to-speech synthesis technique.Comment: ICCV 202

    Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation

    Full text link
    Talking face generation is the challenging task of synthesizing a natural and realistic face that requires accurate synchronization with a given audio. Due to co-articulation, where an isolated phone is influenced by the preceding or following phones, the articulation of a phone varies upon the phonetic context. Therefore, modeling lip motion with the phonetic context can generate more spatio-temporally aligned lip movement. In this respect, we investigate the phonetic context in generating lip motion for talking face generation. We propose Context-Aware Lip-Sync framework (CALS), which explicitly leverages phonetic context to generate lip movement of the target face. CALS is comprised of an Audio-to-Lip module and a Lip-to-Face module. The former is pretrained based on masked learning to map each phone to a contextualized lip motion unit. The contextualized lip motion unit then guides the latter in synthesizing a target identity with context-aware lip motion. From extensive experiments, we verify that simply exploiting the phonetic context in the proposed CALS framework effectively enhances spatio-temporal alignment. We also demonstrate the extent to which the phonetic context assists in lip synchronization and find the effective window size for lip generation to be approximately 1.2 seconds.Comment: Accepted at ICASSP 202

    Reprogramming Audio-driven Talking Face Synthesis into Text-driven

    Full text link
    In this paper, we propose a method to reprogram pre-trained audio-driven talking face synthesis models to be able to operate with text inputs. As the audio-driven talking face synthesis model takes speech audio as inputs, in order to generate a talking avatar with the desired speech content, speech recording needs to be performed in advance. However, this is burdensome to record audio for every video to be generated. In order to alleviate this problem, we propose a novel method that embeds input text into the learned audio latent space of the pre-trained audio-driven model. To this end, we design a Text-to-Audio Embedding Module (TAEM) which is guided to learn to map a given text input to the audio latent features. Moreover, to model the speaker characteristics lying in the audio features, we propose to inject visual speaker embedding into the TAEM, which is obtained from a single face image. After training, we can synthesize talking face videos with either text or speech audio

    Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

    Full text link
    This paper proposes a novel lip reading framework, especially for low-resource languages, which has not been well addressed in the previous literature. Since low-resource languages do not have enough video-text paired data to train the model to have sufficient power to model lip movements and language, it is regarded as challenging to develop lip reading models for low-resource languages. In order to mitigate the challenge, we try to learn general speech knowledge, the ability to model lip movements, from a high-resource language through the prediction of speech units. It is known that different languages partially share common phonemes, thus general speech knowledge learned from one language can be extended to other languages. Then, we try to learn language-specific knowledge, the ability to model language, by proposing Language-specific Memory-augmented Decoder (LMDecoder). LMDecoder saves language-specific audio features into memory banks and can be trained on audio-text paired data which is more easily accessible than video-text paired data. Therefore, with LMDecoder, we can transform the input speech units into language-specific audio features and translate them into texts by utilizing the learned rich language knowledge. Finally, by combining general speech knowledge and language-specific knowledge, we can efficiently develop lip reading models even for low-resource languages. Through extensive experiments using five languages, English, Spanish, French, Italian, and Portuguese, the effectiveness of the proposed method is evaluated.Comment: Accepted at ICCV 202

    Distinct Roles of Outer Membrane Porins in Antibiotic Resistance and Membrane Integrity in Escherichia coli

    Get PDF
    A defining characteristic of Gram-negative bacteria is the presence of an outer membrane, which functions as an additional barrier inhibiting the penetration of toxic chemicals, such as antibiotics. Porins are outer membrane proteins associated with the modulation of cellular permeability and antibiotic resistance. Although there are numerous studies regarding porins, a systematic approach about the roles of porins in bacterial physiology and antibiotic resistance does not exist yet. In this study, we constructed mutants of all porins in Escherichia coli and examined the effect of porins on antibiotic resistance and membrane integrity. The OmpF-defective mutant was resistant to several antibiotics including Ī²-lactams, suggesting that OmpF functions as the main route of outer membrane penetration for many antibiotics. In contrast, OmpA was strongly associated with the maintenance of membrane integrity, which resulted in the increased susceptibility of the ompA mutant to many antibiotics. Notably, OmpC was involved in both the roles. Additionally, our systematic analyses revealed that other porins were not involved in the maintenance of membrane integrity, but several porins played a major or minor role in the outer membrane penetration for a few antibiotics. Collectively, these results show that each porin plays a distinct role in antibiotic resistance and membrane integrity, which could improve our understanding of the physiological function and clinical importance of porins
    • ā€¦
    corecore