929 research outputs found

    Understanding and Modeling the Effects of Task and Context on Drivers' Gaze Allocation

    Full text link
    Understanding what drivers look at is important for many applications, including driver training, monitoring, and assistance, as well as self-driving. Traditionally, factors affecting human visual attention have been divided into bottom-up (involuntary attraction to salient regions) and top-down (task- and context-driven). Although both play a role in drivers' gaze allocation, most of the existing modeling approaches apply techniques developed for bottom-up saliency and do not consider task and context influences explicitly. Likewise, common driving attention benchmarks lack relevant task and context annotations. Therefore, to enable analysis and modeling of these factors for drivers' gaze prediction, we propose the following: 1) address some shortcomings of the popular DR(eye)VE dataset and extend it with per-frame annotations for driving task and context; 2) benchmark a number of baseline and SOTA models for saliency and driver gaze prediction and analyze them w.r.t. the new annotations; and finally, 3) a novel model that modulates drivers' gaze prediction with explicit action and context information, and as a result significantly improves SOTA performance on DR(eye)VE overall (by 24\% KLD and 89\% NSS) and on a subset of action and safety-critical intersection scenarios (by 10--30\% KLD). Extended annotations, code for model and evaluation will be made publicly available.Comment: 12 pages, 8 figures, 8 table

    Driver Attention based on Deep Learning for a Smart Vehicle to Driver (V2D) Interaction

    Get PDF
    La atención del conductor es un tópico interesante dentro del mundo de los vehículos inteligentes para la consecución de tareas que van desde la monitorización del conductor hasta la conducción autónoma. Esta tesis aborda este tópico basándose en algoritmos de aprendizaje profundo para conseguir una interacción inteligente entre el vehículo y el conductor. La monitorización del conductor requiere una estimación precisa de su mirada en un entorno 3D para conocer el estado de su atención. En esta tesis se aborda este problema usando una única cámara, para que pueda ser utilizada en aplicaciones reales, sin un alto coste y sin molestar al conductor. La herramienta desarrollada ha sido evaluada en una base de datos pública (DADA2000), obteniendo unos resultados similares a los obtenidos mediante un seguidor de ojos caro que no puede ser usado en un vehículo real. Además, ha sido usada en una aplicación que evalúa la atención del conductor en la transición de modo autónomo a manual de forma simulada, proponiendo el uso de una métrica novedosa para conocer el estado de la situación del conductor en base a su atención sobre los diferentes objetos de la escena. Por otro lado, se ha propuesto un algoritmo de estimación de atención del conductor, utilizando las últimas técnicas de aprendizaje profundo como son las conditional Generative Adversarial Networks (cGANs) y el Multi-Head Self-Attention. Esto permite enfatizar ciertas zonas de la escena al igual que lo haría un humano. El modelo ha sido entrenado y validado en dos bases de datos públicas (BDD-A y DADA2000) superando a otras propuestas del estado del arte y consiguiendo unos tiempos de inferencia que permiten su uso en aplicaciones reales. Por último, se ha desarrollado un modelo que aprovecha nuestro algoritmo de atención del conductor para comprender una escena de tráfico obteniendo la decisión tomada por el vehículo y su explicación, en base a las imágenes tomadas por una cámara situada en la parte frontal del vehículo. Ha sido entrenado en una base de datos pública (BDD-OIA) proponiendo un modelo que entiende la secuencia temporal de los eventos usando un Transformer Encoder, consiguiendo superar a otras propuestas del estado del arte. Además de su validación en la base de datos, ha sido implementado en una aplicación que interacciona con el conductor aconsejando sobre las decisiones a tomar y sus explicaciones ante diferentes casos de uso en un entorno simulado. Esta tesis explora y demuestra los beneficios de la atención del conductor para el mundo de los vehículos inteligentes, logrando una interacción vehículo conductor a través de las últimas técnicas de aprendizaje profundo

    Skins\screens\circuits: how technology remade the body

    Get PDF
    This thesis is an analysis of the social, political and historical inter-relationships between moving image technologies, constructions of gender and sexuality, and theories of science and technology. Presented as a series of case-studies on film, video, medical imaging and computer technology in the work of five women artists, this thesis looks at the way in which artistic practice overturns traditional theories of technology as purely ‘instrumental’, theories of the subject in which identity is tied to the body, and the assumption that women do not access technology in a sophisticated way. It considers the various ways in which women artists have engaged with, and subverted, the explicit body in representation through deploying new moving image technologies at the historical moment of their widespread distribution across domestic, artistic, pornographic and medical spheres. It ends by asking what is the political potential in challenging the anthropomorphic and destabilizing the figurative through abstraction? Beginning with an investigation into the way in which Carolee Schneemann uses the material properties of film to establish a haptic encounter, in which female and feline bodies are caught up in a sexual economy of touch (pet/petting), this thesis then looks at the work of Kate Craig and the mutual expansion of pornography and home-video technology, questioning the emergence of the ‘amateur’ in relation to theories of power and gender; offers a technological and philosophical modeling of medical imaging technology (taking endoscopy in the work of Mona Hatoum as a case study); and re-evaluates the use of binary in information systems beyond a limiting analogy with ‘Western binaries’ through the work of Nell Tenhaaf. Using the languages of art history together with science & technology studies, medical discourse and feminism, this research theorises gender, technology and medicine as systems of representation that are all deeply inter-connected

    Practical and Rich User Digitization

    Full text link
    A long-standing vision in computer science has been to evolve computing devices into proactive assistants that enhance our productivity, health and wellness, and many other facets of our lives. User digitization is crucial in achieving this vision as it allows computers to intimately understand their users, capturing activity, pose, routine, and behavior. Today's consumer devices - like smartphones and smartwatches provide a glimpse of this potential, offering coarse digital representations of users with metrics such as step count, heart rate, and a handful of human activities like running and biking. Even these very low-dimensional representations are already bringing value to millions of people's lives, but there is significant potential for improvement. On the other end, professional, high-fidelity comprehensive user digitization systems exist. For example, motion capture suits and multi-camera rigs that digitize our full body and appearance, and scanning machines such as MRI capture our detailed anatomy. However, these carry significant user practicality burdens, such as financial, privacy, ergonomic, aesthetic, and instrumentation considerations, that preclude consumer use. In general, the higher the fidelity of capture, the lower the user's practicality. Most conventional approaches strike a balance between user practicality and digitization fidelity. My research aims to break this trend, developing sensing systems that increase user digitization fidelity to create new and powerful computing experiences while retaining or even improving user practicality and accessibility, allowing such technologies to have a societal impact. Armed with such knowledge, our future devices could offer longitudinal health tracking, more productive work environments, full body avatars in extended reality, and embodied telepresence experiences, to name just a few domains.Comment: PhD thesi

    Context-based Visual Feedback Recognition

    Get PDF
    PhD thesisDuring face-to-face conversation, people use visual feedback (e.g.,head and eye gesture) to communicate relevant information and tosynchronize rhythm between participants. When recognizing visualfeedback, people often rely on more than their visual perception.For instance, knowledge about the current topic and from previousutterances help guide the recognition of nonverbal cues. The goal ofthis thesis is to augment computer interfaces with the ability toperceive visual feedback gestures and to enable the exploitation ofcontextual information from the current interaction state to improvevisual feedback recognition.We introduce the concept of visual feedback anticipationwhere contextual knowledge from an interactive system (e.g. lastspoken utterance from the robot or system events from the GUIinterface) is analyzed online to anticipate visual feedback from ahuman participant and improve visual feedback recognition. Ourmulti-modal framework for context-based visual feedback recognitionwas successfully tested on conversational and non-embodiedinterfaces for head and eye gesture recognition.We also introduce Frame-based Hidden-state Conditional RandomField model, a new discriminative model for visual gesturerecognition which can model the sub-structure of a gesture sequence,learn the dynamics between gesture labels, and can be directlyapplied to label unsegmented sequences. The FHCRF model outperformsprevious approaches (i.e. HMM, SVM and CRF) for visual gesturerecognition and can efficiently learn relevant contextualinformation necessary for visual feedback anticipation.A real-time visual feedback recognition library for interactiveinterfaces (called Watson) was developed to recognize head gaze,head gestures, and eye gaze using the images from a monocular orstereo camera and the context information from the interactivesystem. Watson was downloaded by more then 70 researchers around theworld and was successfully used by MERL, USC, NTT, MIT Media Lab andmany other research groups

    Development of a Low-Cost 6 DOF Brick Tracking System for Use in Advanced Gas-Cooled Reactor Model Tests

    Get PDF
    This paper presents the design of a low-cost, compact instrumentation system to enable six degree of freedom motion tracking of acetal bricks within an experimental model of a cracked Advanced Gas-Cooled Reactor (AGR) core. The system comprises optical and inertial sensors and capitalises on the advantages offered by data fusion techniques. The optical system tracks LED indicators, allowing a brick to be accurately located even in cluttered images. The LED positions are identified using a geometrical correspondence algorithm, which was optimised to be computationally efficient for shallow movements, and complex camera distortions are corrected using a versatile Incident Ray-Tracking calibration. Then, a Perspective-Ray-based Scaled Orthographic projection with Iteration (PRSOI) algorithm is applied to each LED position to determine the six degree of freedom pose. Results from experiments show that the system achieves a low Root Mean Squared (RMS) error of 0.2296 mm in x, 0.3943 mm in y, and 0.0703 mm in z. Although providing an accurate measurement solution, the optical tracking system has a low sample rate and requires the line of sight to be maintained throughout each test. To increase the robustness, accuracy, and sampling frequency of the system, the optical system can be augmented with an Inertial Measurement Unit (IMU). This paper presents a method to integrate the optical system and IMU data by accurately timestamping data from each set of sensors and aligning the two coordinate axes. Once miniaturised, the developed system will be used to track smaller components within the AGR models that cannot be tracked with current instrumentation, expanding reactor core modelling capabilities

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    Machinima Filmmaking: The Integration of Immersive Technology for Collaborative Machinima Filmmaking

    Get PDF
    This Digital Media MS project proposes to create a flexible, intuitive, and low-entry barrier virtual cinematography tool that will enable participants engaged in human computer interaction (HCI) activities to quickly stage, choreograph, rehearse, and capture performances in real-time. Heretofore, Machinima developers have used limited forms of expressive input devices to puppeteer characters and record in-game live performances, using a gamepad, keyboard, mouse, and joystick to produce content. This has stagnated Machinima development because machinimators have not embraced the current evolution of input devices to create, capture, and edit content, thereby omitting game engine programming possibilities that could exploit new 3D compositing techniques and alternatives to strengthen interactivity, collaboration, and efficiency in cinematic pipelines. Our system, leveraging consumer-affordable hardware and software, advances the development of Machinima production by providing a foundation for alternative cinematic practices, to create a seamless form of human computer interaction and to positively convince more people toward Machinima filmmaking. We propose to produce an Unreal Engine 4 system plugin which integrates virtual reality and eye tracking, via the Oculus Rift DK2 and Tobii EyeX, respectively. The plugin will enable two people, in the roles of Director and Performer, to navigate and interact within a virtual 3D space to productively affect collaborative Machinima filmmaking.M.S., Digital Media -- Drexel University, 201
    corecore