85 research outputs found

    Making Home: Directing Reintegration Plays

    Get PDF
    This doctoral study explores three American post-combat reintegration plays written and produced in 2010 and 2011 in the United States. Using script analysis and production research, the dissertation describes some of the major themes of reintegration and strengths of theatrical production as revealed by the critically alternative narratives in the scripts and performances of Time Stands Still by Donald Margulies, American Soldiers by Matt Morillo, and Oohrah! by Bekah Brunstetter

    Review on recent Computer Vision Methods for Human Action Recognition

    Get PDF
    The subject of human activity recognition is considered an important goal in the domain of computer vision from the beginning of its development and has reached new levels. It is also thought of as a simple procedure. Problems arise in fast-moving and advanced scenes, and the numerical analysis of artificial intelligence (AI) through activity prediction mistreatment increased the attention of researchers to study. Having decent methodological and content related variations, several datasets were created to address the evaluation of these ways. Human activities play an important role but with challenging characteristic in various fields. Many applications exist in this field, such as smart home, helpful AI, HCI (Human-Computer Interaction), advancements in protection in applications such as transportation, education, security, and medication management, including falling or helping elderly in medical drug consumption. The positive impact of deep learning techniques on many vision applications leads to deploying these ways in video processing. Analysis of human behavior activities involves major challenges when human presence is concerned. One individual can be represented in multiple video sequences through skeleton, motion and/or abstract characteristics. This work aims to address human presence by combining many options and utilizing a new RNN structure for activities. The paper focuses on recent advances in machine learning-assisted action recognition./nExisting modern techniques for the recognition of actions and prediction similarly because the future scope for the analysis is mentioned accuracy within the review paper.La temo de homa agado-rekono estas konsiderata grava celo en la regado de komputila vizio ekde la komenco de ?ia disvolvi?o kaj atingis novajn nivelojn. ?i anka? estas pensata kiel simpla procedo. Problemoj ekestas en rapidaj kaj progresintaj scenoj, kaj la nombra analizo de artefarita inteligenteco (AI) per agado-anta?diro mistraktado pliigis la atenton de esploristoj por studi. Havante decajn metodikajn kaj enhavajn rilatajn varia?ojn, pluraj datenserioj estis kreitaj por trakti la taksadon de ?i tiuj manieroj. Homaj agadoj ludas gravan rolon sed kun malfacila karakteriza?o en diversaj kampoj. Multaj aplikoj ekzistas en ?i tiu kampo, kiel inteligenta hejmo, helpema AI, HCI (Homa-Komputila Interagado), progresoj en protekto en aplikoj kiel transportado, edukado, sekureco kaj administrado de medikamentoj, inkluzive faladon a? helpon al maljunuloj pri kuracado de drogoj. La pozitiva efiko de profundaj lernaj teknikoj sur multaj vidaj aplikoj kondukas al disfaldi ?i tiujn manierojn en video-prilaborado. Analizo de homaj kondutagadoj implikas gravajn defiojn kiam homa ?eesto temas. Unu individuo povas esti reprezentita en multoblaj videosekvencoj tra skeleto, movi?o kaj / a? abstraktaj karakteriza?oj. ?i tiu verko celas trakti homan ?eeston kombinante multajn eblojn kaj uzante novan RNN-strukturon por agadoj. La papero temigas lastatempajn progresojn en ma?inlernado-helpata agado.Ekzistantaj modernaj teknikoj por la rekono de agoj kaj prognozo simile ?ar la estonta amplekso por la analizo estas menciita precizeco ene de la recenzo-papero

    Pictorial Primates: A Search for Iconic Abilities in Great Apes

    Get PDF
    Pictures and other iconic media are used extensively in psychological experiments on nonhuman primate perception, categorisation, etc. They are also used in everyday interaction with primates, and as pure entertainment. But in what ways do primates understand iconic artefacts? What implications do these different ways have for the conclusions we can draw from those studies on perception and categorisation? What can pictures tell us about primate cognition, and what can primates tell us about pictures? The bulk of the thesis is a critical review of the primatological literature concerned with iconic artefacts. Drawing on work in developmental psychology, cross-cultural research, and semiotics, distinctions between different kinds of pictorial competence are made. The alternatives to viewing pictures as depictions, are to view them as the real world is viewed, in which case only realistic pictures evoke recognition, or to view them as a set of disjoint properties, in which case recognition of categorisable motifs fails. It is argued that approaching a picture as a depiction entails a set of expectations on the picture, which affects attention to e.g. part - whole relationships, "filling in," and integration into context. This in turn allows recognition also of non-realistic similarity. The question, then, is whether such expectations can be formed in other brains than an exclusively human one. The different forms of pictorial competence are discussed in relation to research on similarity judgements, abstraction, and categorisation, as well as applied to other iconic media than the picture, such as scale-models, mirrors, toy replicas, and video. Two lines of original empirical investigation are presented: A study of photographic recognition in picture-naïve gorillas, and recognition of line drawings in picture-experienced and language-competent bonobos. Only the latter study yielded evidence for recognition. The failures in the former study are discussed in terms of experimental shortcomings, and suggestions for future improvements are made

    Remote Visual Observation of Real Places Through Virtual Reality Headsets

    Get PDF
    Virtual Reality has always represented a fascinating yet powerful opportunity that has attracted studies and technology developments, especially since the latest release on the market of powerful high-resolution and wide field-of-view VR headsets. While the great potential of such VR systems is common and accepted knowledge, issues remain related to how to design systems and setups capable of fully exploiting the latest hardware advances. The aim of the proposed research is to study and understand how to increase the perceived level of realism and sense of presence when remotely observing real places through VR headset displays. Hence, to produce a set of guidelines that give directions to system designers about how to optimize the display-camera setup to enhance performance, focusing on remote visual observation of real places. The outcome of this investigation represents unique knowledge that is believed to be very beneficial for better VR headset designs towards improved remote observation systems. To achieve the proposed goal, this thesis presents a thorough investigation of existing literature and previous researches, which is carried out systematically to identify the most important factors ruling realism, depth perception, comfort, and sense of presence in VR headset observation. Once identified, these factors are further discussed and assessed through a series of experiments and usability studies, based on a predefined set of research questions. More specifically, the role of familiarity with the observed place, the role of the environment characteristics shown to the viewer, and the role of the display used for the remote observation of the virtual environment are further investigated. To gain more insights, two usability studies are proposed with the aim of defining guidelines and best practices. The main outcomes from the two studies demonstrate that test users can experience an enhanced realistic observation when natural features, higher resolution displays, natural illumination, and high image contrast are used in Mobile VR. In terms of comfort, simple scene layouts and relaxing environments are considered ideal to reduce visual fatigue and eye strain. Furthermore, sense of presence increases when observed environments induce strong emotions, and depth perception improves in VR when several monocular cues such as lights and shadows are combined with binocular depth cues. Based on these results, this investigation then presents a focused evaluation on the outcomes and introduces an innovative eye-adapted High Dynamic Range (HDR) approach, which the author believes to be of great improvement in the context of remote observation when combined with eye-tracked VR headsets. Within this purpose, a third user study is proposed to compare static HDR and eye-adapted HDR observation in VR, to assess that the latter can improve realism, depth perception, sense of presence, and in certain cases even comfort. Results from this last study confirmed the author expectations, proving that eye-adapted HDR and eye tracking should be used to achieve best visual performances for remote observation in modern VR systems

    Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark

    Full text link
    Despite the recent emergence of video captioning models, how to generate the text description with specific entity names and fine-grained actions is far from being solved, which however has great applications such as basketball live text broadcast. In this paper, a new multimodal knowledge graph supported basketball benchmark for video captioning is proposed. Specifically, we construct a multimodal basketball game knowledge graph (KG_NBA_2022) to provide additional knowledge beyond videos. Then, a multimodal basketball game video captioning (VC_NBA_2022) dataset that contains 9 types of fine-grained shooting events and 286 players' knowledge (i.e., images and names) is constructed based on KG_NBA_2022. We develop a knowledge guided entity-aware video captioning network (KEANet) based on a candidate player list in encoder-decoder form for basketball live text broadcast. The temporal contextual information in video is encoded by introducing the bi-directional GRU (Bi-GRU) module. And the entity-aware module is designed to model the relationships among the players and highlight the key players. Extensive experiments on multiple sports benchmarks demonstrate that KEANet effectively leverages extera knowledge and outperforms advanced video captioning models. The proposed dataset and corresponding codes will be publicly available soo

    Town Called Malmö : Nostalgia and Urban Anxiety in Literature from the 1990s and 2000s

    Get PDF

    Scalable Methodologies and Analyses for Modality Bias and Feature Exploitation in Language-Vision Multimodal Deep Learning

    Get PDF
    Multimodal machine learning benchmarks have exponentially grown in both capability and popularity over the last decade. Language-vision question-answering tasks such as Visual Question Answering (VQA) and Video Question Answering (video-QA) have ---thanks to their high difficulty--- become a particularly popular means through which to develop and test new modelling designs and methodology for multimodal deep learning. The challenging nature of VQA and video-QA tasks leaves plenty of room for innovation at every component of the deep learning pipeline: from dataset to modelling methodology. Such circumstances are ideal for innovating in the space of language-vision multimodality. Furthermore, the wider field is currently undergoing an incredible period of growth and increasing interest. I therefore aim to contribute to multiple key components of the VQA and video-QA pipeline, but specifically in a manner such that my contributions remain relevant, ‘scaling’ with the revolutionary new benchmark models and datasets of the near future instead of being rendered obsolete by them. The work in this thesis: highlights and explores the disruptive and problematic presence of language bias in the popular TVQA video-QA dataset, and proposes a dataset-invariant method to identify subsets that respond to different modalities; thoroughly explores the suitability of bilinear pooling as a language-vision fusion technique in video-QA, offering experimental and theoretical insight, and highlighting the parallels in multimodal processing with neurological theories; explores the nascent visual equivalent of languague modelling (`visual modelling') in order to boost the power of visual features; and proposes a dataset-invariant neurolinguistically-inspired labelling scheme for use in multimodal question-answering. I explore the positive and negative results that my experiments across this thesis yield. I conclude by discussing the limitations of my contributions, and conclude with proposals for future directions of study in the areas I contribute to
    corecore