1,286 research outputs found

    SU Variety

    Get PDF

    Unsupervised video indexing on audiovisual characterization of persons

    Get PDF
    Cette thèse consiste à proposer une méthode de caractérisation non-supervisée des intervenants dans les documents audiovisuels, en exploitant des données liées à leur apparence physique et à leur voix. De manière générale, les méthodes d'identification automatique, que ce soit en vidéo ou en audio, nécessitent une quantité importante de connaissances a priori sur le contenu. Dans ce travail, le but est d'étudier les deux modes de façon corrélée et d'exploiter leur propriété respective de manière collaborative et robuste, afin de produire un résultat fiable aussi indépendant que possible de toute connaissance a priori. Plus particulièrement, nous avons étudié les caractéristiques du flux audio et nous avons proposé plusieurs méthodes pour la segmentation et le regroupement en locuteurs que nous avons évaluées dans le cadre d'une campagne d'évaluation. Ensuite, nous avons mené une étude approfondie sur les descripteurs visuels (visage, costume) qui nous ont servis à proposer de nouvelles approches pour la détection, le suivi et le regroupement des personnes. Enfin, le travail s'est focalisé sur la fusion des données audio et vidéo en proposant une approche basée sur le calcul d'une matrice de cooccurrence qui nous a permis d'établir une association entre l'index audio et l'index vidéo et d'effectuer leur correction. Nous pouvons ainsi produire un modèle audiovisuel dynamique des intervenants.This thesis consists to propose a method for an unsupervised characterization of persons within audiovisual documents, by exploring the data related for their physical appearance and their voice. From a general manner, the automatic recognition methods, either in video or audio, need a huge amount of a priori knowledge about their content. In this work, the goal is to study the two modes in a correlated way and to explore their properties in a collaborative and robust way, in order to produce a reliable result as independent as possible from any a priori knowledge. More particularly, we have studied the characteristics of the audio stream and we have proposed many methods for speaker segmentation and clustering and that we have evaluated in a french competition. Then, we have carried a deep study on visual descriptors (face, clothing) that helped us to propose novel approches for detecting, tracking, and clustering of people within the document. Finally, the work was focused on the audiovisual fusion by proposing a method based on computing the cooccurrence matrix that allowed us to establish an association between audio and video indexes, and to correct them. That will enable us to produce a dynamic audiovisual model for each speaker

    Self-supervised Face Representation Learning

    Get PDF
    This thesis investigates fine-tuning deep face features in a self-supervised manner for discriminative face representation learning, wherein we develop methods to automatically generate pseudo-labels for training a neural network. Most importantly solving this problem helps us to advance the state-of-the-art in representation learning and can be beneficial to a variety of practical downstream tasks. Fortunately, there is a vast amount of videos on the internet that can be used by machines to learn an effective representation. We present methods that can learn a strong face representation from large-scale data be the form of images or video. However, while learning a good representation using a deep learning algorithm requires a large-scale dataset with manually curated labels, we propose self-supervised approaches to generate pseudo-labels utilizing the temporal structure of the video data and similarity constraints to get supervision from the data itself. We aim to learn a representation that exhibits small distances between samples from the same person, and large inter-person distances in feature space. Using metric learning one could achieve that as it is comprised of a pull-term, pulling data points from the same class closer, and a push-term, pushing data points from a different class further away. Metric learning for improving feature quality is useful but requires some form of external supervision to provide labels for the same or different pairs. In the case of face clustering in TV series, we may obtain this supervision from tracks and other cues. The tracking acts as a form of high precision clustering (grouping detections within a shot) and is used to automatically generate positive and negative pairs of face images. Inspired from that we propose two variants of discriminative approaches: Track-supervised Siamese network (TSiam) and Self-supervised Siamese network (SSiam). In TSiam, we utilize the tracking supervision to obtain the pair, additional we include negative training pairs for singleton tracks -- tracks that are not temporally co-occurring. As supervision from tracking may not always be available, to enable the use of metric learning without any supervision we propose an effective approach SSiam that can generate the required pairs automatically during training. In SSiam, we leverage dynamic generation of positive and negative pairs based on sorting distances (i.e. ranking) on a subset of frames and do not have to only rely on video/track based supervision. Next, we present a method namely Clustering-based Contrastive Learning (CCL), a new clustering-based representation learning approach that utilizes automatically discovered partitions obtained from a clustering algorithm (FINCH) as weak supervision along with inherent video constraints to learn discriminative face features. As annotating datasets is costly and difficult, using label-free and weak supervision obtained from a clustering algorithm as a proxy learning task is promising. Through our analysis, we show that creating positive and negative training pairs using clustering predictions help to improve the performance for video face clustering. We then propose a method face grouping on graphs (FGG), a method for unsupervised fine-tuning of deep face feature representations. We utilize a graph structure with positive and negative edges over a set of face-tracks based on their temporal structure of the video data and similarity-based constraints. Using graph neural networks, the features communicate over the edges allowing each track\u27s feature to exchange information with its neighbors, and thus push each representation in a direction in feature space that groups all representations of the same person together and separates representations of a different person. Having developed these methods to generate weak-labels for face representation learning, next we propose to learn compact yet effective representation for describing face tracks in videos into compact descriptors, that can complement previous methods towards learning a more powerful face representation. Specifically, we propose Temporal Compact Bilinear Pooling (TCBP) to encode the temporal segments in videos into a compact descriptor. TCBP possesses the ability to capture interactions between each element of the feature representation with one-another over a long-range temporal context. We integrated our previous methods TSiam, SSiam and CCL with TCBP and demonstrated that TCBP has excellent capabilities in learning a strong face representation. We further show TCBP has exceptional transfer abilities to applications such as multimodal video clip representation that jointly encodes images, audio, video and text, and video classification. All of these contributions are demonstrated on benchmark video clustering datasets: The Big Bang Theory, Buffy the Vampire Slayer and Harry Potter 1. We provide extensive evaluations on these datasets achieving a significant boost in performance over the base features, and in comparison to the state-of-the-art results

    Visual Concept Detection in Images and Videos

    Get PDF
    The rapidly increasing proliferation of digital images and videos leads to a situation where content-based search in multimedia databases becomes more and more important. A prerequisite for effective image and video search is to analyze and index media content automatically. Current approaches in the field of image and video retrieval focus on semantic concepts serving as an intermediate description to bridge the “semantic gap” between the data representation and the human interpretation. Due to the large complexity and variability in the appearance of visual concepts, the detection of arbitrary concepts represents a very challenging task. In this thesis, the following aspects of visual concept detection systems are addressed: First, enhanced local descriptors for mid-level feature coding are presented. Based on the observation that scale-invariant feature transform (SIFT) descriptors with different spatial extents yield large performance differences, a novel concept detection system is proposed that combines feature representations for different spatial extents using multiple kernel learning (MKL). A multi-modal video concept detection system is presented that relies on Bag-of-Words representations for visual and in particular for audio features. Furthermore, a method for the SIFT-based integration of color information, called color moment SIFT, is introduced. Comparative experimental results demonstrate the superior performance of the proposed systems on the Mediamill and on the VOC Challenge. Second, an approach is presented that systematically utilizes results of object detectors. Novel object-based features are generated based on object detection results using different pooling strategies. For videos, detection results are assembled to object sequences and a shot-based confidence score as well as further features, such as position, frame coverage or movement, are computed for each object class. These features are used as additional input for the support vector machine (SVM)-based concept classifiers. Thus, other related concepts can also profit from object-based features. Extensive experiments on the Mediamill, VOC and TRECVid Challenge show significant improvements in terms of retrieval performance not only for the object classes, but also in particular for a large number of indirectly related concepts. Moreover, it has been demonstrated that a few object-based features are beneficial for a large number of concept classes. On the VOC Challenge, the additional use of object-based features led to a superior performance for the image classification task of 63.8% mean average precision (AP). Furthermore, the generalization capabilities of concept models are investigated. It is shown that different source and target domains lead to a severe loss in concept detection performance. In these cross-domain settings, object-based features achieve a significant performance improvement. Since it is inefficient to run a large number of single-class object detectors, it is additionally demonstrated how a concurrent multi-class object detection system can be constructed to speed up the detection of many object classes in images. Third, a novel, purely web-supervised learning approach for modeling heterogeneous concept classes in images is proposed. Tags and annotations of multimedia data in the WWW are rich sources of information that can be employed for learning visual concepts. The presented approach is aimed at continuous long-term learning of appearance models and improving these models periodically. For this purpose, several components have been developed: a crawling component, a multi-modal clustering component for spam detection and subclass identification, a novel learning component, called “random savanna”, a validation component, an updating component, and a scalability manager. Only a single word describing the visual concept is required to initiate the learning process. Experimental results demonstrate the capabilities of the individual components. Finally, a generic concept detection system is applied to support interdisciplinary research efforts in the field of psychology and media science. The psychological research question addressed in the field of behavioral sciences is, whether and how playing violent content in computer games may induce aggression. Therefore, novel semantic concepts most notably “violence” are detected in computer game videos to gain insights into the interrelationship of violent game events and the brain activity of a player. Experimental results demonstrate the excellent performance of the proposed automatic concept detection approach for such interdisciplinary research

    March 3, 1981

    Get PDF
    The Breeze is the student newspaper of James Madison University in Harrisonburg, Virginia

    Projecting Hitler : representations of Adolf Hitler in English-language film, 1968-1990

    Get PDF
    In the post-Second World War period, the medium of film has been arguably the leading popular culture protagonist of a demonized Adolf Hitler. Between 1968 and 1990, thirty-five English-language films featuring representations of Hitler were released in cinemas, on television, or on home video. In the 1968 to 1979 period, fifteen films were released, with the remaining twenty coming between 1980 and 1990. This increase reveals not only a growing popular fascination with Hitler, but also a tendency to use the Führer as a sign for demonic evil. These representations are broken into three categories – (1) prominent; (2) satirical; (3) contextualizing – which are then analyzed according to whether a representation is demonizing or humanizing. Out of these thirty-five films, twenty-three can be labeled as demonizing and nine as humanizing, and there are three films that cannot be appropriately located in either category. In the 1968 to 1979 period, four films employed prominent Hitler representations, five films satirized Hitler, with six contextualizing films. The 1980s played host to five prominent representations, six satires, and nine contextualizing films. In total, there are nine prominent representations, eleven satires and fifteen contextualizing films. Arguing that prominent representations are the most influential, this study argues that the 1968 to 1979 period formed and shaped the sign of a demonic Führer, and its acceptance is demonstrated by films released between1980 and 1990. However, the appearance of two prominent films in the 1980s which humanized Hitler is significant, for these two films hint at the beginnings of a breakdown in the hegemony of the Hitler sign. The cinematic demonization of Hitler is accomplished in a variety of ways, all of which portray the National Socialist leader as an abstract figure outside of human behaviour and comprehension. Scholarly history is also shown to have contributed to this mythologizing, as the “survival myth” and myth of “the last ten days” have their origins in historiography. However, since the 1970s film has arguably overtaken historiography in shaping popular conceptions of the National Socialist leader. In addition to pointing out the connections between film and historiography, this study also suggests other political, philosophical, and cultural reasons for the demonization of Adolf Hitler

    Current, December 04, 1980

    Get PDF
    https://irl.umsl.edu/current1980s/1027/thumbnail.jp

    Digital tools in media studies: analysis and research. An overview

    Get PDF
    Digital tools are increasingly used in media studies, opening up new perspectives for research and analysis, while creating new problems at the same time. In this volume, international media scholars and computer scientists present their projects, varying from powerful film-historical databases to automatic video analysis software, discussing their application of digital tools and reporting on their results. This book is the first publication of its kind and a helpful guide to both media scholars and computer scientists who intend to use digital tools in their research, providing information on applications, standards, and problems

    Digital Tools in Media Studies

    Get PDF
    Digital tools are increasingly used in media studies, opening up new perspectives for research and analysis, while creating new problems at the same time. In this volume, international media scholars and computer scientists present their projects, varying from powerful film-historical databases to automatic video analysis software, discussing their application of digital tools and reporting on their results. This book is the first publication of its kind and a helpful guide to both media scholars and computer scientists who intend to use digital tools in their research, providing information on applications, standards, and problems

    The bakla and the silver screen : queer cinema in the Philippines

    Full text link
    This study looks at representations of the bakla in Philippine cinema from 1954 to 2015. I argue that the bakla, a local gender category that incorporates ideas of male homosexuality, effeminacy, cross-dressing, and transgenderism, has become a central figure in Philippine cinema. I examine the films of Dolphy, the actor who was the first to popularize bakla roles in mainstream cinema, and I outline several tropes that create the bakla image in movies: “classical” kabaklaan (being bakla), which involves crossdressing, effeminacy, and being woman-hearted; the conversion trope, where the bakla stubbornly resists being forcibly masculinized by the men that surround him; and women’s enabling of the bakla to perform femininity despite his male body. The bakla also becomes trapped in the dialectic between the genres of comedy and melodrama – the bakla then becomes either a comic relief character or a tragic character, someone who should be both pitied and admired for enduring tragic circumstances. Since the word bakla colloquially means both the gay man and the transgender woman, I also look at the intersections between the bakla and global transgender rights discourse in Philippine cinema. I argue that the bakla’s male body and his female heart mirror contemporary thinking about transgenderism, where a person’s birth-assigned sex does not match their gender identity. I look at films in the science fiction/fantasy adventure genre and examine how the male body is transformed into a female body – whether through medical or magical means – but the bakla gender category remains pervasive. I next look at independent films and how the camera constructs bakla sexuality in cinema by framing the male body as an object of desire. I also examine how the bakla have shifted their sexuality from the desiring the otherness of the macho lalake (masculine man) to desiring sameness in the form of other bakla (masculine gay men). Incidentally, this shift in the object of sexuality coincides with the shift in gender performance from “traditional” kabaklaan, with its elements of effeminacy and crossdressing, to a more homonormative image, mirroring the “modern” discourse of gay globality. Finally, I examine the contemporary Philippine celebrity star system and the popularity of bakla films within the last decade. I argue that Vice Ganda becomes representative of the ideal bakla, one who is affluent but still ‘reachable’; opinionated and politically engaged with LGBT rights; and finally, funny and comic in a way that is empowering for himself, but also in manners that are abrasive and offensive
    corecore