1,211 research outputs found

    Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition

    Full text link
    Visual Speech Recognition (VSR) aims to infer speech into text depending on lip movements alone. As it focuses on visual information to model the speech, its performance is inherently sensitive to personal lip appearances and movements, and this makes the VSR models show degraded performance when they are applied to unseen speakers. In this paper, to remedy the performance degradation of the VSR model on unseen speakers, we propose prompt tuning methods of Deep Neural Networks (DNNs) for speaker-adaptive VSR. Specifically, motivated by recent advances in Natural Language Processing (NLP), we finetune prompts on adaptation data of target speakers instead of modifying the pre-trained model parameters. Different from the previous prompt tuning methods mainly limited to Transformer variant architecture, we explore different types of prompts, the addition, the padding, and the concatenation form prompts that can be applied to the VSR model which is composed of CNN and Transformer in general. With the proposed prompt tuning, we show that the performance of the pre-trained VSR model on unseen speakers can be largely improved by using a small amount of adaptation data (e.g., less than 5 minutes), even if the pre-trained model is already developed with large speaker variations. Moreover, by analyzing the performance and parameters of different types of prompts, we investigate when the prompt tuning is preferred over the finetuning methods. The effectiveness of the proposed method is evaluated on both word- and sentence-level VSR databases, LRW-ID and GRID

    In Vitro Chemosensitivity Using the Histoculture Drug Response Assay in Human Epithelial Ovarian Cancer

    Get PDF
    The choice of chemotherapeutic drugs to treat patients with epithelial ovarian cancer has not depended on individual patient characteristics. We have investigated the correlation between in vitro chemosensitivity, as determined by the histoculture drug response assay (HDRA), and clinical responses in epithelial ovarian cancer. Fresh tissue samples were obtained from 79 patients with epithelial ovarian cancer. The sensitivity of these samples to 11 chemotherapeutic agents was tested using the HDRA method according to established methods, and we analyzed the results retrospectively. HDRA showed that they were more chemosensitive to carboplatin, topotecan and belotecan, with inhibition rates of 49.2%, 44.7%, and 39.7%, respectively, than to cisplatin, the traditional drug of choice in epithelial ovarian cancer. Among the 37 patients with FIGO stage Ⅲ/Ⅳ serous adenocarcinoma who were receiving carboplatin combined with paclitaxel, those with carboplatin-sensitive samples on HDRA had a significantly longer median disease-free interval than patients with carboplatin- resistant samples (23.2 vs. 13.8 months, p<0.05), but median overall survival did not differ significantly (60.4 vs. 37.3 months, p=0.621). In conclusion, this study indicates that HDRA could provide useful information for designing individual treatment strategies in patients with epithelial ovarian cancer

    Incorporating Language-Driven Appearance Knowledge Units with Visual Cues in Pedestrian Detection

    Full text link
    Large language models (LLMs) have shown their capability in understanding contextual and semantic information regarding appearance knowledge of instances. In this paper, we introduce a novel approach to utilize the strength of an LLM in understanding contextual appearance variations and to leverage its knowledge into a vision model (here, pedestrian detection). While pedestrian detection is considered one of crucial tasks directly related with our safety (e.g., intelligent driving system), it is challenging because of varying appearances and poses in diverse scenes. Therefore, we propose to formulate language-driven appearance knowledge units and incorporate them with visual cues in pedestrian detection. To this end, we establish description corpus which includes numerous narratives describing various appearances of pedestrians and others. By feeding them through an LLM, we extract appearance knowledge sets that contain the representations of appearance variations. After that, we perform a task-prompting process to obtain appearance knowledge units which are representative appearance knowledge guided to be relevant to a downstream pedestrian detection task. Finally, we provide plentiful appearance information by integrating the language-driven knowledge units with visual cues. Through comprehensive experiments with various pedestrian detectors, we verify the effectiveness of our method showing noticeable performance gains and achieving state-of-the-art detection performance.Comment: 11 pages, 4 figures, 9 table
    corecore