1,361 research outputs found
Efficient and effective human action recognition in video through motion boundary description with a compact set of trajectories
Human action recognition (HAR) is at the core of human-computer interaction and video scene understanding. However, achieving effective HAR in an unconstrained environment is still a challenging task. To that end, trajectory-based video representations are currently widely used. Despite the promising levels of effectiveness achieved by these approaches, problems regarding computational complexity and the presence of redundant trajectories still need to be addressed in a satisfactory way. In this paper, we propose a method for trajectory rejection, reducing the number of redundant trajectories without degrading the effectiveness of HAR. Furthermore, to realize efficient optical flow estimation prior to trajectory extraction, we integrate a method for dynamic frame skipping. Experiments with four publicly available human action datasets show that the proposed approach outperforms state-of-the-art HAR approaches in terms of effectiveness, while simultaneously mitigating the computational complexity
Comment on ``Solution of Classical Stochastic One-Dimensional Many-Body Systems''
In a recent Letter, Bares and Mobilia proposed the method to find solutions
of the stochastic evolution operator with a
non-trivial quartic term . They claim, ``Because of the conservation of
probability, an analog of the Wick theorem applies and all multipoint
correlation functions can be computed.'' Using the Wick theorem, they expressed
the density correlation functions as solutions of a closed set of
integro-differential equations.
In this Comment, however, we show that applicability of Wick theorem is
restricted to the case only.Comment: 1 page, revtex style, comment on paper Phys. Rev. Lett. {\bf 83},
5214 (1999
Depression and PTSD in Pashtun Women in Kandahar, Afghanistan
ObjectivesThe objectives were (a) to establish prevalence of depression and post-traumatic stress disorder (PTSD) in Afghanistan and, (b) to investigate sociodemographic and quality of life variables, which predict depression and PTSD.MethodsTranslated versions of the Beck Depression Inventory, Impact of Event Scale-Revised, and Quality of Life Inventory were administered to 125 Pashtun women in Kandahar, and statistically analyzed.ResultsApproximately half of the participants showed moderate to severe levels of depression, and more than half of the participants exhibited symptoms of PTSD. Education and income showed significant associations with PTSD symptoms or depression. The way one spends time, general health status, and general feeling towards life predicted low levels of depression and PTSD.ConclusionsThe high prevalence of depression and PTSD indicate the continuing need for mental health intervention. While education has been found to be a protective factor for mental health in previous studies, the relationship between education and mental health appear to be more complex among Afghan women. Quality of life variables could be further investigated and incorporated into mental health interventions for Afghan women
Fermentation characteristics of Korean pear (Pyrus pyrifolia Nakai) puree by the Leuconostoc mesenteroides 51-3 strain isolated from Kimchi
A lactic acid bacterial strain showing fast growth and high acid production when cultured in Korean pear puree was isolated from Kimchi. This strain was analyzed by using the API 50 CHL kit and 16S rRNA sequencing and was thus identified as Leuconostoc mesenteroides 51-3. Korean pear puree was fermented with the L. mesenteroides 51-3 strain at 30°C for 12 h. The changes in pH, titratable acidity and viable cell count during fermentation were investigated. The pH and titratable acidity of the pear puree were 4.06 and 0.66%, respectively, after 12 h of fermentation. The viable cell count of L. mesenteroides 51-3 rapidly increased to 3.7 × 109 CFU/g after 12 h of cultivation. The content of lactic acid and acetic acid was determined to be 0.138 and 0.162%, respectively, after 12 h of fermentation. When the fermented pear puree was stored at 4°C, the pH, titratable acidity and viable cell count remained fairly constant for 14 days.Keywords: Fermentation, Korean pear puree, Leuconostoc mesenteroides.African Journal of Biotechnology Vol. 9(35), pp. 5735-5738, 30 August, 201
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
This paper proposes a powerful Visual Speech Recognition (VSR) method for
multiple languages, especially for low-resource languages that have a limited
number of labeled data. Different from previous methods that tried to improve
the VSR performance for the target language by using knowledge learned from
other languages, we explore whether we can increase the amount of training data
itself for the different languages without human intervention. To this end, we
employ a Whisper model which can conduct both language identification and
audio-based speech recognition. It serves to filter data of the desired
languages and transcribe labels from the unannotated, multilingual audio-visual
data pool. By comparing the performances of VSR models trained on automatic
labels and the human-annotated labels, we show that we can achieve similar VSR
performance to that of human-annotated labels even without utilizing human
annotations. Through the automated labeling process, we label large-scale
unlabeled multilingual databases, VoxCeleb2 and AVSpeech, producing 1,002 hours
of data for four low VSR resource languages, French, Italian, Spanish, and
Portuguese. With the automatic labels, we achieve new state-of-the-art
performance on mTEDx in four languages, significantly surpassing the previous
methods. The automatic labels are available online:
https://github.com/JeongHun0716/Visual-Speech-Recognition-for-Low-Resource-LanguagesComment: Accepted at ICASSP 202
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
This paper proposes a novel lip reading framework, especially for
low-resource languages, which has not been well addressed in the previous
literature. Since low-resource languages do not have enough video-text paired
data to train the model to have sufficient power to model lip movements and
language, it is regarded as challenging to develop lip reading models for
low-resource languages. In order to mitigate the challenge, we try to learn
general speech knowledge, the ability to model lip movements, from a
high-resource language through the prediction of speech units. It is known that
different languages partially share common phonemes, thus general speech
knowledge learned from one language can be extended to other languages. Then,
we try to learn language-specific knowledge, the ability to model language, by
proposing Language-specific Memory-augmented Decoder (LMDecoder). LMDecoder
saves language-specific audio features into memory banks and can be trained on
audio-text paired data which is more easily accessible than video-text paired
data. Therefore, with LMDecoder, we can transform the input speech units into
language-specific audio features and translate them into texts by utilizing the
learned rich language knowledge. Finally, by combining general speech knowledge
and language-specific knowledge, we can efficiently develop lip reading models
even for low-resource languages. Through extensive experiments using five
languages, English, Spanish, French, Italian, and Portuguese, the effectiveness
of the proposed method is evaluated.Comment: Accepted at ICCV 202
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Visual Speech Recognition (VSR) is the task of predicting spoken words from
silent lip movements. VSR is regarded as a challenging task because of the
insufficient information on lip movements. In this paper, we propose an Audio
Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement
the insufficient speech information of visual modality by using audio modality.
Different from the previous methods, the proposed AKVSR 1) utilizes rich audio
knowledge encoded by a large-scale pretrained audio model, 2) saves the
linguistic information of audio knowledge in compact audio memory by discarding
the non-linguistic information from the audio through quantization, and 3)
includes Audio Bridging Module which can find the best-matched audio features
from the compact audio memory, which makes our training possible without audio
inputs, once after the compact audio memory is composed. We validate the
effectiveness of the proposed method through extensive experiments, and achieve
new state-of-the-art performances on the widely-used datasets, LRS2 and LRS3
- …