132,702 research outputs found

    Oops! Predicting Unintentional Action in Video

    Full text link
    From just a short glance at a video, we can often tell whether a person's action is intentional or not. Can we train a model to recognize this? We introduce a dataset of in-the-wild videos of unintentional action, as well as a suite of tasks for recognizing, localizing, and anticipating its onset. We train a supervised neural network as a baseline and analyze its performance compared to human consistency on the tasks. We also investigate self-supervised representations that leverage natural signals in our dataset, and show the effectiveness of an approach that uses the intrinsic speed of video to perform competitively with highly-supervised pretraining. However, a significant gap between machine and human performance remains. The project website is available at https://oops.cs.columbia.eduComment: 11 pages, 9 figure

    Leveraging TCN and Transformer for effective visual-audio fusion in continuous emotion recognition

    Full text link
    Human emotion recognition plays an important role in human-computer interaction. In this paper, we present our approach to the Valence-Arousal (VA) Estimation Challenge, Expression (Expr) Classification Challenge, and Action Unit (AU) Detection Challenge of the 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW). Specifically, we propose a novel multi-modal fusion model that leverages Temporal Convolutional Networks (TCN) and Transformer to enhance the performance of continuous emotion recognition. Our model aims to effectively integrate visual and audio information for improved accuracy in recognizing emotions. Our model outperforms the baseline and ranks 3 in the Expression Classification challenge.Comment: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW

    Wild Emptiness: A Zen Approach to Environmental Ethics

    Get PDF
    When Buddhism took root in China and integrated with the nation’s Taoist intellectual climate, the tradition retained the orthodox central objective of overcoming suffering. While conserving this principal aspiration, the rise of Zen is associated with deviation from the orthodox practice of monasticism and toward the practical embodiment of emptiness while integrated in society, which can be likened to the practice of unwavering compassion. This piece offers a Zen approach to environmental ethics such that it is an attempt to explicate how and why an individual practicing Zen should compassionately engage with the earth. With respect to the Buddhist employment of skillful means, different approaches are offered as tailored ethical frameworks to appeal to individuals at different stages in their path to awakening. Furthermore, the environmental ethic appealed to by the awakened individual is explicated as spontaneous harmonization with the flow of emptiness, that which the Buddhists regard to be the driving force of the phenomenal realm. The awakened individual is considered to non-deliberately take on a lifestyle that provides perpetual intimacy with the rhythms of wild emptines

    Facial Expression Recognition from World Wild Web

    Full text link
    Recognizing facial expression in a wild setting has remained a challenging task in computer vision. The World Wide Web is a good source of facial images which most of them are captured in uncontrolled conditions. In fact, the Internet is a Word Wild Web of facial images with expressions. This paper presents the results of a new study on collecting, annotating, and analyzing wild facial expressions from the web. Three search engines were queried using 1250 emotion related keywords in six different languages and the retrieved images were mapped by two annotators to six basic expressions and neutral. Deep neural networks and noise modeling were used in three different training scenarios to find how accurately facial expressions can be recognized when trained on noisy images collected from the web using query terms (e.g. happy face, laughing man, etc)? The results of our experiments show that deep neural networks can recognize wild facial expressions with an accuracy of 82.12%

    AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

    Get PDF
    This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. This departs from existing datasets for spatio-temporal action recognition, which typically provide sparse annotations for composite actions in short video clips. We will release the dataset publicly. AVA, with its realistic scene and action complexity, exposes the intrinsic difficulty of action recognition. To benchmark this, we present a novel approach for action localization that builds upon the current state-of-the-art methods, and demonstrates better performance on JHMDB and UCF101-24 categories. While setting a new state of the art on existing datasets, the overall results on AVA are low at 15.6% mAP, underscoring the need for developing new approaches for video understanding.Comment: To appear in CVPR 2018. Check dataset page https://research.google.com/ava/ for detail
    • …
    corecore