13,492 research outputs found

    Sign Language Translation from Instructional Videos

    Full text link
    The advances in automatic sign language translation (SLT) to spoken languages have been mostly benchmarked with datasets of limited size and restricted domains. Our work advances the state of the art by providing the first baseline results on How2Sign, a large and broad dataset. We train a Transformer over I3D video features, using the reduced BLEU as a reference metric for validation, instead of the widely used BLEU score. We report a result of 8.03 on the BLEU score, and publish the first open-source implementation of its kind to promote further advances.Comment: Paper accepted at WiCV @CVPR2

    Comedians without a Cause: The Politics and Aesthetics of Humour in Dutch Cabaret (1966-2020)

    Get PDF
    Comedians play an important role in society and public debate. While comedians have been considered important cultural critics for quite some time, comedy has acquired a new social and political significance in recent years, with humour taking centre stage in political and social debates around issues of identity, social justice, and freedom of speech. To understand the shifting meanings and political implications of humour within a Dutch context, this PhD thesis examines the political and aesthetic workings of humour in the highly popular Dutch cabaret genre, focusing on cabaret performances from the 1960s to the present. The central questions of the thesis are: how do comedians use humour to deliver social critique, and how does their humour resonate with political ideologies? These questions are answered by adopting a cultural studies approach to humour, which is used to analyse Dutch cabaret performances, and by studying related materials such as reviews and media interviews with comedians. This thesis shows that, from the 1960s onwards, Dutch comedians have been considered ‘progressive rebels’ – politically engaged, subversive, and carrying a left-wing political agenda – but that this image is in need of correction. While we tend to look for progressive political messages in the work of comedians who present themselves as being anti-establishment rebels – such as Youp van ‘t Hek, Hans Teeuwen, and Theo Maassen – this thesis demonstrates that their transgressive and provocative humour tends to protect social hierarchies and relationships of power. Moreover, it shows that, paradoxically, both the deliberately moderate and nuanced humour of Wim Kan and Claudia de Breij, and the seemingly past-oriented nostalgia of Alex Klaasen, are more radical and progressive than the transgressive humour of van ‘t Hek, Teeuwen and Maassen. Finally, comedians who present absurdist or deconstructionist forms of humour, such as the early student cabarets, Freek de Jonge, and Micha Wertheim, tend to disassociate themselves from an explicit political engagement. By challenging the dominant image of the Dutch comedian as a ‘progressive rebel,’ this thesis contributes to a better understanding of humour in the present cultural moment, in which humour is often either not taken seriously, or one-sidedly celebrated as being merely pleasurable, innocent, or progressively liberating. In so doing, this thesis concludes, the ‘dark’ and more conservative sides of humour tend to get obscured

    Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data

    Full text link
    Scaling up weakly-supervised datasets has shown to be highly effective in the image-text domain and has contributed to most of the recent state-of-the-art computer vision and multimodal neural networks. However, existing large-scale video-text datasets and mining techniques suffer from several limitations, such as the scarcity of aligned data, the lack of diversity in the data, and the difficulty of collecting aligned data. Currently popular video-text data mining approach via automatic speech recognition (ASR) used in HowTo100M provides low-quality captions that often do not refer to the video content. Other mining approaches do not provide proper language descriptions (video tags) and are biased toward short clips (alt text). In this work, we show how recent advances in image captioning allow us to pre-train high-quality video models without any parallel video-text data. We pre-train several video captioning models that are based on an OPT language model and a TimeSformer visual backbone. We fine-tune these networks on several video captioning datasets. First, we demonstrate that image captioning pseudolabels work better for pre-training than the existing HowTo100M ASR captions. Second, we show that pre-training on both images and videos produces a significantly better network (+4 CIDER on MSR-VTT) than pre-training on a single modality. Our methods are complementary to the existing pre-training or data mining approaches and can be used in a variety of settings. Given the efficacy of the pseudolabeling method, we are planning to publicly release the generated captions

    Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

    Full text link
    Humans have long been recorded in a variety of forms since antiquity. For example, sculptures and paintings were the primary media for depicting human beings before the invention of cameras. However, most current human-centric computer vision tasks like human pose estimation and human image generation focus exclusively on natural images in the real world. Artificial humans, such as those in sculptures, paintings, and cartoons, are commonly neglected, making existing models fail in these scenarios. As an abstraction of life, art incorporates humans in both natural and artificial scenes. We take advantage of it and introduce the Human-Art dataset to bridge related tasks in natural and artificial scenarios. Specifically, Human-Art contains 50k high-quality images with over 123k person instances from 5 natural and 15 artificial scenarios, which are annotated with bounding boxes, keypoints, self-contact points, and text information for humans represented in both 2D and 3D. It is, therefore, comprehensive and versatile for various downstream tasks. We also provide a rich set of baseline results and detailed analyses for related tasks, including human detection, 2D and 3D human pose estimation, image generation, and motion transfer. As a challenging dataset, we hope Human-Art can provide insights for relevant research and open up new research questions.Comment: CVPR202

    Procedure-Aware Pretraining for Instructional Video Understanding

    Full text link
    Our goal is to learn a video representation that is useful for downstream procedure understanding tasks in instructional videos. Due to the small amount of available annotations, a key challenge in procedure understanding is to be able to extract from unlabeled videos the procedural knowledge such as the identity of the task (e.g., 'make latte'), its steps (e.g., 'pour milk'), or the potential next steps given partial progress in its execution. Our main insight is that instructional videos depict sequences of steps that repeat between instances of the same or different tasks, and that this structure can be well represented by a Procedural Knowledge Graph (PKG), where nodes are discrete steps and edges connect steps that occur sequentially in the instructional activities. This graph can then be used to generate pseudo labels to train a video representation that encodes the procedural knowledge in a more accessible form to generalize to multiple procedure understanding tasks. We build a PKG by combining information from a text-based procedural knowledge database and an unlabeled instructional video corpus and then use it to generate training pseudo labels with four novel pre-training objectives. We call this PKG-based pre-training procedure and the resulting model Paprika, Procedure-Aware PRe-training for Instructional Knowledge Acquisition. We evaluate Paprika on COIN and CrossTask for procedure understanding tasks such as task recognition, step recognition, and step forecasting. Paprika yields a video representation that improves over the state of the art: up to 11.23% gains in accuracy in 12 evaluation settings. Implementation is available at https://github.com/salesforce/paprika.Comment: CVPR 202

    SViTT: Temporal Learning of Sparse Video-Text Transformers

    Full text link
    Do video-text transformers learn to model temporal relationships across frames? Despite their immense capacity and the abundance of multimodal training data, recent work has revealed the strong tendency of video-text models towards frame-based spatial representations, while temporal reasoning remains largely unsolved. In this work, we identify several key challenges in temporal learning of video-text transformers: the spatiotemporal trade-off from limited network size; the curse of dimensionality for multi-frame modeling; and the diminishing returns of semantic information by extending clip length. Guided by these findings, we propose SViTT, a sparse video-text architecture that performs multi-frame reasoning with significantly lower cost than naive transformers with dense attention. Analogous to graph-based networks, SViTT employs two forms of sparsity: edge sparsity that limits the query-key communications between tokens in self-attention, and node sparsity that discards uninformative visual tokens. Trained with a curriculum which increases model sparsity with the clip length, SViTT outperforms dense transformer baselines on multiple video-text retrieval and question answering benchmarks, with a fraction of computational cost. Project page: http://svcl.ucsd.edu/projects/svitt.Comment: CVPR 202

    Wildlife trade in Latin America: people, economy and conservation

    Get PDF
    Wildlife trade is among the main threats to biodiversity conservation and may pose a risk to human health because of the spread of zoonotic diseases. To avoid social, economic and environmental consequences of illegal trade, it is crucial to understand the factors influencing the wildlife market and the effectiveness of policies already in place. I aim to unveil the biological and socioeconomic factors driving wildlife trade, the health risks imposed by the activity, and the effectiveness of certified captive-breeding as a strategy to curb the illegal market in Latin America through a multidisciplinary approach. I assess socioeconomic correlates of the emerging international trade in wild cat species from Latin America using a dataset of >1,000 seized cats, showing that high levels of corruption and Chinese private investment and low income per capita were related to higher numbers of jaguar seizures. I assess the effectiveness of primate captive-breeding programmes as an intervention to curb wildlife trafficking. Illegal sources held >70% of the primate market share. Legal primates are more expensive, and the production is not sufficiently high to fulfil the demand. I assess the scale of the illegal trade and ownership of venomous snakes in Brazil. Venomous snake taxa responsible for higher numbers of snakebites were those most often kept as pets. I uncover how online wildlife pet traders and consumers responded to campaigns associating the origin of the COVID-19 pandemic. Of 20,000 posts on Facebook groups, only 0.44% mentioned COVID-19 and several stimulated the trade in wild species during lockdown. Despite the existence of international and national wildlife trade regulations, I conclude that illegal wildlife trade is still an issue that needs further addressing in Latin America. I identify knowledge gaps and candidate interventions to amend the current loopholes to reduce wildlife trafficking. My aspiration with this thesis is to provide useful information that can inform better strategies to tackle illegal wildlife trade in Latin America

    Building body identities - exploring the world of female bodybuilders

    Get PDF
    This thesis explores how female bodybuilders seek to develop and maintain a viable sense of self despite being stigmatized by the gendered foundations of what Erving Goffman (1983) refers to as the 'interaction order'; the unavoidable presentational context in which identities are forged during the course of social life. Placed in the context of an overview of the historical treatment of women's bodies, and a concern with the development of bodybuilding as a specific form of body modification, the research draws upon a unique two year ethnographic study based in the South of England, complemented by interviews with twenty-six female bodybuilders, all of whom live in the U.K. By mapping these extraordinary women's lives, the research illuminates the pivotal spaces and essential lived experiences that make up the female bodybuilder. Whilst the women appear to be embarking on an 'empowering' radical body project for themselves, the consequences of their activity remains culturally ambivalent. This research exposes the 'Janus-faced' nature of female bodybuilding, exploring the ways in which the women negotiate, accommodate and resist pressures to engage in more orthodox and feminine activities and appearances

    TOWARDS AN UNDERSTANDING OF EFFORTFUL FUNDRAISING EXPERIENCES: USING INTERPRETATIVE PHENOMENOLOGICAL ANALYSIS IN FUNDRAISING RESEARCH

    Get PDF
    Physical-activity oriented community fundraising has experienced an exponential growth in popularity over the past 15 years. The aim of this study was to explore the value of effortful fundraising experiences, from the point of view of participants, and explore the impact that these experiences have on people’s lives. This study used an IPA approach to interview 23 individuals, recognising the role of participants as proxy (nonprofessional) fundraisers for charitable organisations, and the unique organisation donor dynamic that this creates. It also bought together relevant psychological theory related to physical activity fundraising experiences (through a narrative literature review) and used primary interview data to substantiate these. Effortful fundraising experiences are examined in detail to understand their significance to participants, and how such experiences influence their connection with a charity or cause. This was done with an idiographic focus at first, before examining convergences and divergences across the sample. This study found that effortful fundraising experiences can have a profound positive impact upon community fundraisers in both the short and the long term. Additionally, it found that these experiences can be opportunities for charitable organisations to create lasting meaningful relationships with participants, and foster mutually beneficial lifetime relationships with them. Further research is needed to test specific psychological theory in this context, including self-esteem theory, self determination theory, and the martyrdom effect (among others)
    • …
    corecore