438 research outputs found

    A Survey on Human-aware Robot Navigation

    Full text link
    Intelligent systems are increasingly part of our everyday lives and have been integrated seamlessly to the point where it is difficult to imagine a world without them. Physical manifestations of those systems on the other hand, in the form of embodied agents or robots, have so far been used only for specific applications and are often limited to functional roles (e.g. in the industry, entertainment and military fields). Given the current growth and innovation in the research communities concerned with the topics of robot navigation, human-robot-interaction and human activity recognition, it seems like this might soon change. Robots are increasingly easy to obtain and use and the acceptance of them in general is growing. However, the design of a socially compliant robot that can function as a companion needs to take various areas of research into account. This paper is concerned with the navigation aspect of a socially-compliant robot and provides a survey of existing solutions for the relevant areas of research as well as an outlook on possible future directions.Comment: Robotics and Autonomous Systems, 202

    Spatial displays for visual awareness of remote locations

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. [113]-116).uCom enables remote users to be visually aware of each other using "spatial displays" - live views of a remote space assembled according to an estimate of the remote space's layout. The main elements of the system design are a 3D representation of each space and a multi-display physical setup. The 3D image-based representation of a space is composed of an aggregate of live video feeds acquired from multiple viewpoints and rendered in a graphical visualization resembling a 3D collage. Its navigation controls allow users to transition among the remote views, while maintaining a sense of how the images relate in 3D space. Additionally, the system uses a configurable set of displays to portray always-on visual connections with a remote site integrated into the local physical environment. The evaluation investigates to what extent the system improves users' understanding of the layout of a remote space.by Ana Luisa de Araujo Santos.S.M

    From Vision-Language Multimodal Learning Towards Embodied Agents

    Get PDF
    To build machine agents with intelligent capabilities mimicking human perception and cognition, vision and language stand out as two essential modalities and foster computer vision and natural language processing. Advances in such realms stimulate research in vision-language multimodal learning that allows optical and linguistic inputs and outputs. Due to the innate difference between the two modalities and the lack of large-scale fine-grained annotations, multimodal agents tend to inherit unimodal shortcuts. In this thesis, we develop various solutions to intervene unimodal shortcuts for multimodal generation and reasoning. For visual shortcuts, we introduce a linguistic prior and devise a syntax-aware action targeting module for dynamic description to rectify the correlation between subject and object in a sentence. We apply concept hierarchy and propose a visual superordinate abstraction framework for unbiased concept learning to reduce the correlation among different attributes of an object. For linguistic shortcuts, we disentangle the topic and syntax to reduce the repetition in generated paragraph descriptions for a given image. With the ubiquity of large-scale pre-trained models, we leverage self-supervised learning in finetuning process to increase the robustness of multimodal reasoning. The rapid development in multimodal learning promises embodied agents capable of interacting with physical environments. This thesis studies the typical embodied task vision-and-language navigation in discrete scenarios and proposes an episodic scene memory (ESceme) mechanism to balance generalization and efficiency. We figure out one desirable instantiation of the mechanism, namely candidate enhancing, and validate its superiority in various settings. Without extra time and computational cost before inference, ESceme improves performance in unseen environments by a large margin. We hope our findings can inspire more practical explorations on episodic memory in embodied AI

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    A basis for learning with desktop virtual environments

    Get PDF

    The Eye in Motion: Mid-Victorian Fiction and Moving-Image Technologies

    Get PDF
    This thesis reads selected works of fiction by three mid-Victorian writers (Charlotte Brontë, Charles Dickens, and George Eliot) alongside contemporaneous innovations and developments in moving-image technologies, or what have been referred to by historians of film as ‘pre-cinematic devices’. It looks specifically at the moving panorama, diorama, dissolving magic lantern slides, the kaleidoscope, and persistence of vision devices such as the phenakistiscope and zoetrope, and ranges across scientific writing, journalism, letters, and paintings to demonstrate the scope and popularity of visual motion devices. By exploring this history of optical technologies I show how their display, mechanism, and manual operation contributed to a broader cultural and literary interest in the phenomenological experience of animation, decades before the establishment of cinematography as an industry, technology, and viewing practice. Through a close reading of a range of mid-Victorian novels, this thesis identifies and analyses the literary use of language closely associated with moving-image technologies to argue that the Victorian literary imagination reflected upon, drew from, and incorporated reference to visual and technological animation many decades earlier than critics, focusing usually on early twentieth-century cinema and modernist literature, have allowed. It develops current scholarship on Victorian visual culture and optical technologies by a close reading of the language of moving-image devices—found in advertisements, reviews, and descriptions of their physiological operation and spectacle—alongside the choices Victorian authors made to describe precisely how their characters perceived, how they imagined, remembered, and mentally relived particular scenes and images, and how the readers of their texts were encouraged to imaginatively ‘see’ the animated unfolding of the plot and the material dimensionality of its world through a shared understanding of this language of moving images

    Impact of Imaging and Distance Perception in VR Immersive Visual Experience

    Get PDF
    Virtual reality (VR) headsets have evolved to include unprecedented viewing quality. Meanwhile, they have become lightweight, wireless, and low-cost, which has opened to new applications and a much wider audience. VR headsets can now provide users with greater understanding of events and accuracy of observation, making decision-making faster and more effective. However, the spread of immersive technologies has shown a slow take-up, with the adoption of virtual reality limited to a few applications, typically related to entertainment. This reluctance appears to be due to the often-necessary change of operating paradigm and some scepticism towards the "VR advantage". The need therefore arises to evaluate the contribution that a VR system can make to user performance, for example to monitoring and decision-making. This will help system designers understand when immersive technologies can be proposed to replace or complement standard display systems such as a desktop monitor. In parallel to the VR headsets evolution there has been that of 360 cameras, which are now capable to instantly acquire photographs and videos in stereoscopic 3D (S3D) modality, with very high resolutions. 360° images are innately suited to VR headsets, where the captured view can be observed and explored through the natural rotation of the head. Acquired views can even be experienced and navigated from the inside as they are captured. The combination of omnidirectional images and VR headsets has opened to a new way of creating immersive visual representations. We call it: photo-based VR. This represents a new methodology that combines traditional model-based rendering with high-quality omnidirectional texture-mapping. Photo-based VR is particularly suitable for applications related to remote visits and realistic scene reconstruction, useful for monitoring and surveillance systems, control panels and operator training. The presented PhD study investigates the potential of photo-based VR representations. It starts by evaluating the role of immersion and user’s performance in today's graphical visual experience, to then use it as a reference to develop and evaluate new photo-based VR solutions. With the current literature on photo-based VR experience and associated user performance being very limited, this study builds new knowledge from the proposed assessments. We conduct five user studies on a few representative applications examining how visual representations can be affected by system factors (camera and display related) and how it can influence human factors (such as realism, presence, and emotions). Particular attention is paid to realistic depth perception, to support which we develop target solutions for photo-based VR. They are intended to provide users with a correct perception of space dimension and objects size. We call it: true-dimensional visualization. The presented work contributes to unexplored fields including photo-based VR and true-dimensional visualization, offering immersive system designers a thorough comprehension of the benefits, potential, and type of applications in which these new methods can make the difference. This thesis manuscript and its findings have been partly presented in scientific publications. In particular, five conference papers on Springer and the IEEE symposia, [1], [2], [3], [4], [5], and one journal article in an IEEE periodical [6], have been published

    Virtual Heritage: new technologies for edutainment

    Get PDF
    Cultural heritage represents an enormous amount of information and knowledge. Accessing this treasure chest allows not only to discover the legacy of physical and intangible attributes of the past but also to provide a better understanding of the present. Museums and cultural institutions have to face the problem of providing access to and communicating these cultural contents to a wide and assorted audience, meeting the expectations and interests of the reference end-users and relying on the most appropriate tools available. Given the large amount of existing tangible and intangible heritage, artistic, historical and cultural contents, what can be done to preserve and properly disseminate their heritage significance? How can these items be disseminated in the proper way to the public, taking into account their enormous heterogeneity? Answering this question requires to deal as well with another aspect of the problem: the evolution of culture, literacy and society during the last decades of 20th century. To reflect such transformations, this period witnessed a shift in the museum’s focus from the aesthetic value of museum artifacts to the historical and artistic information they encompass, and a change into the museums’ role from a mere "container" of cultural objects to a "narrative space" able to explain, describe, and revive the historical material in order to attract and entertain visitors. These developments require creating novel exhibits, able to tell stories about the objects and enabling visitors to construct semantic meanings around them. The objective that museums presently pursue is reflected by the concept of Edutainment, Education + Entertainment. Nowadays, visitors are not satisfied with ‘learning something’, but would rather engage in an ‘experience of learning’, or ‘learning for fun’, being active actors and players in their own cultural experience. As a result, institutions are faced with several new problems, like the need to communicate with people from different age groups and different cultural backgrounds, the change in people attitude due to the massive and unexpected diffusion of technology into everyday life, the need to design the visit by a personal point of view, leading to a high level of customization that allows visitors to shape their path according to their characteristics and interests. In order to cope with these issues, I investigated several approaches. In particular, I focused on Virtual Learning Environments (VLE): real-time interactive virtual environments where visitors can experience a journey through time and space, being immersed into the original historical, cultural and artistic context of the work of arts on display. VLE can strongly help archivists and exhibit designers, allowing to create new interesting and captivating ways to present cultural materials. In this dissertation I will tackle many of the different dimensions related to the creation of a cultural virtual experience. During my research project, the entire pipeline involved into the development and deployment of VLE has been investigated. The approach followed was to analyze in details the main sub-problems to face, in order to better focus on specific issues. Therefore, I first analyzed different approaches to an effective recreation of the historical and cultural context of heritage contents, which is ultimately aimed at an effective transfer of knowledge to the end-users. In particular, I identified the enhancement of the users’ sense of presence in VLE as one of the main tools to reach this objective. Presence is generally expressed as the perception of 'being there', i.e. the subjective belief of users that they are in a certain place, even if they know that the experience is mediated by the computer. Presence is related to the number of senses involved by the VLE and to the quality of the sensorial stimuli. But in a cultural scenario, this is not sufficient as the cultural presence plays a relevant role. Cultural presence is not just a feeling of 'being there' but of being - not only physically, but also socially, culturally - 'there and then'. In other words, the VLE must be able to transfer not only the appearance, but also all the significance and characteristics of the context that makes it a place and both the environment and the context become tools capable of transferring the cultural significance of a historic place. The attention that users pay to the mediated environment is another aspect that contributes to presence. Attention is related to users’ focalization and concentration and to their interests. Thus, in order to improve the involvement and capture the attention of users, I investigated in my work the adoption of narratives and storytelling experiences, which can help people making sense of history and culture, and of gamification approaches, which explore the use of game thinking and game mechanics in cultural contexts, thus engaging users while disseminating cultural contents and, why not?, letting them have fun during this process. Another dimension related to the effectiveness of any VLE is also the quality of the user experience (UX). User interaction, with both the virtual environment and its digital contents, is one of the main elements affecting UX. With respect to this I focused on one of the most recent and promising approaches: the natural interaction, which is based on the idea that persons need to interact with technology in the same way they are used to interact with the real world in everyday life. Then, I focused on the problem of presenting, displaying and communicating contents. VLE represent an ideal presentation layer, being multiplatform hypermedia applications where users are free to interact with the virtual reconstructions by choosing their own visiting path. Cultural items, embedded into the environment, can be accessed by users according to their own curiosity and interests, with the support of narrative structures, which can guide them through the exploration of the virtual spaces, and conceptual maps, which help building meaningful connections between cultural items. Thus, VLE environments can even be seen as visual interfaces to DBs of cultural contents. Users can navigate the VE as if they were browsing the DB contents, exploiting both text-based queries and visual-based queries, provided by the re-contextualization of the objects into their original spaces, whose virtual exploration can provide new insights on specific elements and improve the awareness of relationships between objects in the database. Finally, I have explored the mobile dimension, which became absolutely relevant in the last period. Nowadays, off-the-shelf consumer devices as smartphones and tablets guarantees amazing computing capabilities, support for rich multimedia contents, geo-localization and high network bandwidth. Thus, mobile devices can support users in mobility and detect the user context, thus allowing to develop a plethora of location-based services, from way-finding to the contextualized communication of cultural contents, aimed at providing a meaningful exploration of exhibits and cultural or tourist sites according to visitors’ personal interest and curiosity
    • …
    corecore