1,361 research outputs found

    Efficient and effective human action recognition in video through motion boundary description with a compact set of trajectories

    Get PDF
    Human action recognition (HAR) is at the core of human-computer interaction and video scene understanding. However, achieving effective HAR in an unconstrained environment is still a challenging task. To that end, trajectory-based video representations are currently widely used. Despite the promising levels of effectiveness achieved by these approaches, problems regarding computational complexity and the presence of redundant trajectories still need to be addressed in a satisfactory way. In this paper, we propose a method for trajectory rejection, reducing the number of redundant trajectories without degrading the effectiveness of HAR. Furthermore, to realize efficient optical flow estimation prior to trajectory extraction, we integrate a method for dynamic frame skipping. Experiments with four publicly available human action datasets show that the proposed approach outperforms state-of-the-art HAR approaches in terms of effectiveness, while simultaneously mitigating the computational complexity

    Comment on ``Solution of Classical Stochastic One-Dimensional Many-Body Systems''

    Full text link
    In a recent Letter, Bares and Mobilia proposed the method to find solutions of the stochastic evolution operator H=H0+γLH1H=H_0 + {\gamma\over L} H_1 with a non-trivial quartic term H1H_1. They claim, ``Because of the conservation of probability, an analog of the Wick theorem applies and all multipoint correlation functions can be computed.'' Using the Wick theorem, they expressed the density correlation functions as solutions of a closed set of integro-differential equations. In this Comment, however, we show that applicability of Wick theorem is restricted to the case γ=0\gamma = 0 only.Comment: 1 page, revtex style, comment on paper Phys. Rev. Lett. {\bf 83}, 5214 (1999

    Depression and PTSD in Pashtun Women in Kandahar, Afghanistan

    Get PDF
    ObjectivesThe objectives were (a) to establish prevalence of depression and post-traumatic stress disorder (PTSD) in Afghanistan and, (b) to investigate sociodemographic and quality of life variables, which predict depression and PTSD.MethodsTranslated versions of the Beck Depression Inventory, Impact of Event Scale-Revised, and Quality of Life Inventory were administered to 125 Pashtun women in Kandahar, and statistically analyzed.ResultsApproximately half of the participants showed moderate to severe levels of depression, and more than half of the participants exhibited symptoms of PTSD. Education and income showed significant associations with PTSD symptoms or depression. The way one spends time, general health status, and general feeling towards life predicted low levels of depression and PTSD.ConclusionsThe high prevalence of depression and PTSD indicate the continuing need for mental health intervention. While education has been found to be a protective factor for mental health in previous studies, the relationship between education and mental health appear to be more complex among Afghan women. Quality of life variables could be further investigated and incorporated into mental health interventions for Afghan women

    Fermentation characteristics of Korean pear (Pyrus pyrifolia Nakai) puree by the Leuconostoc mesenteroides 51-3 strain isolated from Kimchi

    Get PDF
    A lactic acid bacterial strain showing fast growth and high acid production when cultured in Korean pear puree was isolated from Kimchi. This strain was analyzed by using the API 50 CHL kit and 16S rRNA sequencing and was thus identified as Leuconostoc mesenteroides 51-3. Korean pear puree was fermented with the L. mesenteroides 51-3 strain at 30°C for 12 h. The changes in pH, titratable acidity and viable cell count during fermentation were investigated. The pH and titratable acidity of the pear puree were 4.06 and 0.66%, respectively, after 12 h of fermentation. The viable cell count of L. mesenteroides 51-3 rapidly increased to 3.7 × 109 CFU/g after 12 h of cultivation. The content of lactic acid and acetic acid was determined to be 0.138 and 0.162%, respectively, after 12 h of fermentation. When the fermented pear puree was stored at 4°C, the pH, titratable acidity and viable cell count remained fairly constant for 14 days.Keywords: Fermentation, Korean pear puree, Leuconostoc mesenteroides.African Journal of Biotechnology Vol. 9(35), pp. 5735-5738, 30 August, 201

    Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

    Full text link
    This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple languages, especially for low-resource languages that have a limited number of labeled data. Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the different languages without human intervention. To this end, we employ a Whisper model which can conduct both language identification and audio-based speech recognition. It serves to filter data of the desired languages and transcribe labels from the unannotated, multilingual audio-visual data pool. By comparing the performances of VSR models trained on automatic labels and the human-annotated labels, we show that we can achieve similar VSR performance to that of human-annotated labels even without utilizing human annotations. Through the automated labeling process, we label large-scale unlabeled multilingual databases, VoxCeleb2 and AVSpeech, producing 1,002 hours of data for four low VSR resource languages, French, Italian, Spanish, and Portuguese. With the automatic labels, we achieve new state-of-the-art performance on mTEDx in four languages, significantly surpassing the previous methods. The automatic labels are available online: https://github.com/JeongHun0716/Visual-Speech-Recognition-for-Low-Resource-LanguagesComment: Accepted at ICASSP 202

    Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

    Full text link
    This paper proposes a novel lip reading framework, especially for low-resource languages, which has not been well addressed in the previous literature. Since low-resource languages do not have enough video-text paired data to train the model to have sufficient power to model lip movements and language, it is regarded as challenging to develop lip reading models for low-resource languages. In order to mitigate the challenge, we try to learn general speech knowledge, the ability to model lip movements, from a high-resource language through the prediction of speech units. It is known that different languages partially share common phonemes, thus general speech knowledge learned from one language can be extended to other languages. Then, we try to learn language-specific knowledge, the ability to model language, by proposing Language-specific Memory-augmented Decoder (LMDecoder). LMDecoder saves language-specific audio features into memory banks and can be trained on audio-text paired data which is more easily accessible than video-text paired data. Therefore, with LMDecoder, we can transform the input speech units into language-specific audio features and translate them into texts by utilizing the learned rich language knowledge. Finally, by combining general speech knowledge and language-specific knowledge, we can efficiently develop lip reading models even for low-resource languages. Through extensive experiments using five languages, English, Spanish, French, Italian, and Portuguese, the effectiveness of the proposed method is evaluated.Comment: Accepted at ICCV 202

    AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

    Full text link
    Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements. VSR is regarded as a challenging task because of the insufficient information on lip movements. In this paper, we propose an Audio Knowledge empowered Visual Speech Recognition framework (AKVSR) to complement the insufficient speech information of visual modality by using audio modality. Different from the previous methods, the proposed AKVSR 1) utilizes rich audio knowledge encoded by a large-scale pretrained audio model, 2) saves the linguistic information of audio knowledge in compact audio memory by discarding the non-linguistic information from the audio through quantization, and 3) includes Audio Bridging Module which can find the best-matched audio features from the compact audio memory, which makes our training possible without audio inputs, once after the compact audio memory is composed. We validate the effectiveness of the proposed method through extensive experiments, and achieve new state-of-the-art performances on the widely-used datasets, LRS2 and LRS3
    • …
    corecore