1,211 research outputs found
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Visual Speech Recognition (VSR) aims to infer speech into text depending on
lip movements alone. As it focuses on visual information to model the speech,
its performance is inherently sensitive to personal lip appearances and
movements, and this makes the VSR models show degraded performance when they
are applied to unseen speakers. In this paper, to remedy the performance
degradation of the VSR model on unseen speakers, we propose prompt tuning
methods of Deep Neural Networks (DNNs) for speaker-adaptive VSR. Specifically,
motivated by recent advances in Natural Language Processing (NLP), we finetune
prompts on adaptation data of target speakers instead of modifying the
pre-trained model parameters. Different from the previous prompt tuning methods
mainly limited to Transformer variant architecture, we explore different types
of prompts, the addition, the padding, and the concatenation form prompts that
can be applied to the VSR model which is composed of CNN and Transformer in
general. With the proposed prompt tuning, we show that the performance of the
pre-trained VSR model on unseen speakers can be largely improved by using a
small amount of adaptation data (e.g., less than 5 minutes), even if the
pre-trained model is already developed with large speaker variations. Moreover,
by analyzing the performance and parameters of different types of prompts, we
investigate when the prompt tuning is preferred over the finetuning methods.
The effectiveness of the proposed method is evaluated on both word- and
sentence-level VSR databases, LRW-ID and GRID
In Vitro Chemosensitivity Using the Histoculture Drug Response Assay in Human Epithelial Ovarian Cancer
The choice of chemotherapeutic drugs to treat patients with epithelial ovarian cancer has not depended on individual patient characteristics. We have investigated the correlation between in vitro chemosensitivity, as determined by the histoculture drug response assay (HDRA), and clinical responses in epithelial ovarian cancer. Fresh tissue samples were obtained from 79 patients with epithelial
ovarian cancer. The sensitivity of these samples to 11 chemotherapeutic agents was tested using the HDRA method according to established methods, and we analyzed the results retrospectively. HDRA showed that they were more chemosensitive to carboplatin, topotecan and belotecan, with inhibition rates of 49.2%, 44.7%, and 39.7%, respectively, than to cisplatin, the traditional drug of choice in epithelial ovarian cancer. Among the 37 patients with FIGO stage Ⅲ/Ⅳ serous adenocarcinoma
who were receiving carboplatin combined with paclitaxel, those with carboplatin-sensitive samples on HDRA had a significantly longer median disease-free interval than patients with carboplatin-
resistant samples (23.2 vs. 13.8 months, p<0.05), but median overall survival did not differ significantly
(60.4 vs. 37.3 months, p=0.621). In conclusion, this study indicates that HDRA could provide useful information for designing individual treatment strategies in patients with epithelial ovarian cancer
Incorporating Language-Driven Appearance Knowledge Units with Visual Cues in Pedestrian Detection
Large language models (LLMs) have shown their capability in understanding
contextual and semantic information regarding appearance knowledge of
instances. In this paper, we introduce a novel approach to utilize the strength
of an LLM in understanding contextual appearance variations and to leverage its
knowledge into a vision model (here, pedestrian detection). While pedestrian
detection is considered one of crucial tasks directly related with our safety
(e.g., intelligent driving system), it is challenging because of varying
appearances and poses in diverse scenes. Therefore, we propose to formulate
language-driven appearance knowledge units and incorporate them with visual
cues in pedestrian detection. To this end, we establish description corpus
which includes numerous narratives describing various appearances of
pedestrians and others. By feeding them through an LLM, we extract appearance
knowledge sets that contain the representations of appearance variations. After
that, we perform a task-prompting process to obtain appearance knowledge units
which are representative appearance knowledge guided to be relevant to a
downstream pedestrian detection task. Finally, we provide plentiful appearance
information by integrating the language-driven knowledge units with visual
cues. Through comprehensive experiments with various pedestrian detectors, we
verify the effectiveness of our method showing noticeable performance gains and
achieving state-of-the-art detection performance.Comment: 11 pages, 4 figures, 9 table
- …