28 research outputs found

    Interaction-Driven Active 3D Reconstruction with Object Interiors

    Full text link
    We introduce an active 3D reconstruction method which integrates visual perception, robot-object interaction, and 3D scanning to recover both the exterior and interior, i.e., unexposed, geometries of a target 3D object. Unlike other works in active vision which focus on optimizing camera viewpoints to better investigate the environment, the primary feature of our reconstruction is an analysis of the interactability of various parts of the target object and the ensuing part manipulation by a robot to enable scanning of occluded regions. As a result, an understanding of part articulations of the target object is obtained on top of complete geometry acquisition. Our method operates fully automatically by a Fetch robot with built-in RGBD sensors. It iterates between interaction analysis and interaction-driven reconstruction, scanning and reconstructing detected moveable parts one at a time, where both the articulated part detection and mesh reconstruction are carried out by neural networks. In the final step, all the remaining, non-articulated parts, including all the interior structures that had been exposed by prior part manipulations and subsequently scanned, are reconstructed to complete the acquisition. We demonstrate the performance of our method via qualitative and quantitative evaluation, ablation studies, comparisons to alternatives, as well as experiments in a real environment.Comment: Accepted to SIGGRAPH Asia 2023, project page at https://vcc.tech/research/2023/InterReco

    State of the Art on Diffusion Models for Visual Computing

    Full text link
    The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applications has seen exponential growth and relevant papers are published across the computer graphics, computer vision, and AI communities with new works appearing daily on arXiv. This rapid growth of the field makes it difficult to keep up with all recent developments. The goal of this state-of-the-art report (STAR) is to introduce the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others. Moreover, we give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing, categorized by the type of generated medium, including 2D images, videos, 3D objects, locomotion, and 4D scenes. Finally, we discuss available datasets, metrics, open challenges, and social implications. This STAR provides an intuitive starting point to explore this exciting topic for researchers, artists, and practitioners alike

    Semantics-Driven Large-Scale 3D Scene Retrieval

    Get PDF

    Let's Walk Up and Play! Design and Evaluation of Collaborative Interactive Musical Experiences for Public Settings

    Get PDF
    This thesis focuses on the design and evaluation of interactive music systems that enable non-experts to experience collaborative music-making in public set- tings, such as museums, galleries and festivals. Although there has been previous research into music systems for non-experts, there is very limited research on how participants engage with collaborative music environments in public set- tings. Informed by a detailed assessment of related research, an interactive, multi-person music system is developed, which serves as a vehicle to conduct practice-based research in real-world settings. A central focus of the design is supporting each player's individual sense of control, in order to examine how this relates to their overall playing experience. Drawing on approaches from Human-Computer Interaction (HCI) and interac- tive art research, a series of user studies is conducted in public settings such as art exhibitions and festivals. Taking into account that the user experience and social dynamics around such new forms of interaction are considerably in u- enced by the context of use, this systematic assessment in real-world contexts contributes to a richer understanding of how people interact and behave in such new creative spaces. This research makes a number of contributions to the elds of HCI, interactive art and New Interfaces for Musical Expression (NIME). It provides a set of de- sign implications to aid designers of future collaborative music systems. These are based on a number of empirical ndings that describe and explain aspects of audience behaviour, engagement and mutual interaction around public, in- teractive multi-person systems. It provides empirical evidence that there is a correlation between participants' perceived level of control and their sense of cre- ative participation and enjoyment. This thesis also develops and demonstrates the application of a mixed-method approach for studying technology-mediated collaborative creativity with live audiences.This research was funded by a Doctoral Studentship from Queen Mary University of London. The studies of this thesis were kindly supported by the Centre for Digital Music EPSRC Platform Grant (EP/E045235/1; EP/K009559/1), and by Hunan University, Changsha, China (Study II). The attendance at ACM Creativity & Cognition 2013 was kindly supported by the EPSRC funded DePIC project (EP/J017205/1)

    Proceedings. 9th 3DGeoInfo Conference 2014, [11-13 November 2014, Dubai]

    Get PDF
    It is known that, scientific disciplines such as geology, geophysics, and reservoir exploration intrinsically use 3D geo-information in their models and simulations. However, 3D geo-information is also urgently needed in many traditional 2D planning areas such as civil engineering, city and infrastructure modeling, architecture, environmental planning etc. Altogether, 3DGeoInfo is an emerging technology that will greatly influence the market within the next few decades. The 9th International 3DGeoInfo Conference aims at bringing together international state-of-the-art researchers and practitioners facilitating the dialogue on emerging topics in the field of 3D geo-information. The conference in Dubai offers an interdisciplinary forum of sub- and above-surface 3D geo-information researchers and practitioners dealing with data acquisition, modeling, management, maintenance, visualization, and analysis of 3D geo-information

    Balancing User Experience for Mobile One-to-One Interpersonal Telepresence

    Get PDF
    The COVID-19 virus disrupted all aspects of our daily lives, and though the world is finally returning to normalcy, the pandemic has shown us how ill-prepared we are to support social interactions when expected to remain socially distant. Family members missed major life events of their loved ones; face-to-face interactions were replaced with video chat; and the technologies used to facilitate interim social interactions caused an increase in depression, stress, and burn-out. It is clear that we need better solutions to address these issues, and one avenue showing promise is that of Interpersonal Telepresence. Interpersonal Telepresence is an interaction paradigm in which two people can share mobile experiences and feel as if they are together, even though geographically distributed. In this dissertation, we posit that this paradigm has significant value in one-to-one, asymmetrical contexts, where one user can live-stream their experiences to another who remains at home. We discuss a review of the recent Interpersonal Telepresence literature, highlighting research trends and opportunities that require further examination. Specifically, we show how current telepresence prototypes do not meet the social needs of the streamer, who often feels socially awkward when using obtrusive devices. To combat this negative finding, we present a qualitative co-design study in which end users worked together to design their ideal telepresence systems, overcoming value tensions that naturally arise between Viewer and Streamer. Expectedly, virtual reality techniques are desired to provide immersive views of the remote location; however, our participants noted that the devices to facilitate this interaction need to be hidden from the public eye. This suggests that 360∘^\circ cameras should be used, but the lenses need to be embedded in wearable systems, which might affect the viewing experience. We thus present two quantitative studies in which we examine the effects of camera placement and height on the viewing experience, in an effort to understand how we can better design telepresence systems. We found that camera height is not a significant factor, meaning wearable cameras do not need to be positioned at the natural eye-level of the viewer; the streamer is able to place them according to their own needs. Lastly, we present a qualitative study in which we deploy a custom interpersonal telepresence prototype on the co-design findings. Our participants preferred our prototype instead of simple video chat, even though it caused a somewhat increased sense of self-consciousness. Our participants indicated that they have their own preferences, even with simple design decisions such as style of hat, and we as a community need to consider ways to allow customization within our devices. Overall, our work contributes new knowledge to the telepresence field and helps system designers focus on the features that truly matter to users, in an effort to let people have richer experiences and virtually bridge the distance to their loved ones

    A Cultural History of the Disneyland Theme Parks

    Get PDF
    The first comparative historical study of the six Disneyland theme parks around the world in five distinct cultures: the USA, Tokyo, Paris, Hong Kong and Shanghai. Situates the parks in their respective historic contexts at the time of their opening, and considers the part that class plays in the success or failure of these ventures

    A Cultural History of the Disneyland Theme Parks

    Get PDF
    The first comparative historical study of the six Disneyland theme parks around the world in five distinct cultures: the USA, Tokyo, Paris, Hong Kong and Shanghai. Situates the parks in their respective historic contexts at the time of their opening, and considers the part that class plays in the success or failure of these ventures

    UAV or Drones for Remote Sensing Applications in GPS/GNSS Enabled and GPS/GNSS Denied Environments

    Get PDF
    The design of novel UAV systems and the use of UAV platforms integrated with robotic sensing and imaging techniques, as well as the development of processing workflows and the capacity of ultra-high temporal and spatial resolution data, have enabled a rapid uptake of UAVs and drones across several industries and application domains.This book provides a forum for high-quality peer-reviewed papers that broaden awareness and understanding of single- and multiple-UAV developments for remote sensing applications, and associated developments in sensor technology, data processing and communications, and UAV system design and sensing capabilities in GPS-enabled and, more broadly, Global Navigation Satellite System (GNSS)-enabled and GPS/GNSS-denied environments.Contributions include:UAV-based photogrammetry, laser scanning, multispectral imaging, hyperspectral imaging, and thermal imaging;UAV sensor applications; spatial ecology; pest detection; reef; forestry; volcanology; precision agriculture wildlife species tracking; search and rescue; target tracking; atmosphere monitoring; chemical, biological, and natural disaster phenomena; fire prevention, flood prevention; volcanic monitoring; pollution monitoring; microclimates; and land use;Wildlife and target detection and recognition from UAV imagery using deep learning and machine learning techniques;UAV-based change detection

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise
    corecore