28 research outputs found
Interaction-Driven Active 3D Reconstruction with Object Interiors
We introduce an active 3D reconstruction method which integrates visual
perception, robot-object interaction, and 3D scanning to recover both the
exterior and interior, i.e., unexposed, geometries of a target 3D object.
Unlike other works in active vision which focus on optimizing camera viewpoints
to better investigate the environment, the primary feature of our
reconstruction is an analysis of the interactability of various parts of the
target object and the ensuing part manipulation by a robot to enable scanning
of occluded regions. As a result, an understanding of part articulations of the
target object is obtained on top of complete geometry acquisition. Our method
operates fully automatically by a Fetch robot with built-in RGBD sensors. It
iterates between interaction analysis and interaction-driven reconstruction,
scanning and reconstructing detected moveable parts one at a time, where both
the articulated part detection and mesh reconstruction are carried out by
neural networks. In the final step, all the remaining, non-articulated parts,
including all the interior structures that had been exposed by prior part
manipulations and subsequently scanned, are reconstructed to complete the
acquisition. We demonstrate the performance of our method via qualitative and
quantitative evaluation, ablation studies, comparisons to alternatives, as well
as experiments in a real environment.Comment: Accepted to SIGGRAPH Asia 2023, project page at
https://vcc.tech/research/2023/InterReco
State of the Art on Diffusion Models for Visual Computing
The field of visual computing is rapidly advancing due to the emergence of
generative artificial intelligence (AI), which unlocks unprecedented
capabilities for the generation, editing, and reconstruction of images, videos,
and 3D scenes. In these domains, diffusion models are the generative AI
architecture of choice. Within the last year alone, the literature on
diffusion-based tools and applications has seen exponential growth and relevant
papers are published across the computer graphics, computer vision, and AI
communities with new works appearing daily on arXiv. This rapid growth of the
field makes it difficult to keep up with all recent developments. The goal of
this state-of-the-art report (STAR) is to introduce the basic mathematical
concepts of diffusion models, implementation details and design choices of the
popular Stable Diffusion model, as well as overview important aspects of these
generative AI tools, including personalization, conditioning, inversion, among
others. Moreover, we give a comprehensive overview of the rapidly growing
literature on diffusion-based generation and editing, categorized by the type
of generated medium, including 2D images, videos, 3D objects, locomotion, and
4D scenes. Finally, we discuss available datasets, metrics, open challenges,
and social implications. This STAR provides an intuitive starting point to
explore this exciting topic for researchers, artists, and practitioners alike
Let's Walk Up and Play! Design and Evaluation of Collaborative Interactive Musical Experiences for Public Settings
This thesis focuses on the design and evaluation of interactive music systems
that enable non-experts to experience collaborative music-making in public set-
tings, such as museums, galleries and festivals. Although there has been previous
research into music systems for non-experts, there is very limited research on
how participants engage with collaborative music environments in public set-
tings. Informed by a detailed assessment of related research, an interactive,
multi-person music system is developed, which serves as a vehicle to conduct
practice-based research in real-world settings. A central focus of the design is
supporting each player's individual sense of control, in order to examine how
this relates to their overall playing experience.
Drawing on approaches from Human-Computer Interaction (HCI) and interac-
tive art research, a series of user studies is conducted in public settings such as
art exhibitions and festivals. Taking into account that the user experience and
social dynamics around such new forms of interaction are considerably in
u-
enced by the context of use, this systematic assessment in real-world contexts
contributes to a richer understanding of how people interact and behave in such
new creative spaces.
This research makes a number of contributions to the elds of HCI, interactive
art and New Interfaces for Musical Expression (NIME). It provides a set of de-
sign implications to aid designers of future collaborative music systems. These
are based on a number of empirical ndings that describe and explain aspects
of audience behaviour, engagement and mutual interaction around public, in-
teractive multi-person systems. It provides empirical evidence that there is a
correlation between participants' perceived level of control and their sense of cre-
ative participation and enjoyment. This thesis also develops and demonstrates
the application of a mixed-method approach for studying technology-mediated
collaborative creativity with live audiences.This research was funded by a Doctoral Studentship from Queen Mary University of London. The studies of this thesis were kindly supported by the Centre for Digital Music EPSRC Platform Grant (EP/E045235/1; EP/K009559/1), and by
Hunan University, Changsha, China (Study II). The attendance at ACM Creativity & Cognition 2013 was kindly supported by the EPSRC funded DePIC project (EP/J017205/1)
Proceedings. 9th 3DGeoInfo Conference 2014, [11-13 November 2014, Dubai]
It is known that, scientific disciplines such as geology, geophysics, and reservoir exploration intrinsically use 3D geo-information in their models and simulations. However, 3D geo-information is also urgently needed in many traditional 2D planning areas such as civil engineering, city and infrastructure modeling, architecture, environmental planning etc. Altogether, 3DGeoInfo is an emerging technology that will greatly influence the market within the next few decades. The 9th International 3DGeoInfo Conference aims at bringing together international state-of-the-art researchers and practitioners facilitating the dialogue on emerging topics in the field of 3D geo-information. The conference in Dubai offers an interdisciplinary forum of sub- and above-surface 3D geo-information researchers and practitioners dealing with data acquisition, modeling, management, maintenance, visualization, and analysis of 3D geo-information
Balancing User Experience for Mobile One-to-One Interpersonal Telepresence
The COVID-19 virus disrupted all aspects of our daily lives, and though the world is finally returning to normalcy, the pandemic has shown us how ill-prepared we are to support social interactions when expected to remain socially distant. Family members missed major life events of their loved ones; face-to-face interactions were replaced with video chat; and the technologies used to facilitate interim social interactions caused an increase in depression, stress, and burn-out. It is clear that we need better solutions to address these issues, and one avenue showing promise is that of Interpersonal Telepresence. Interpersonal Telepresence is an interaction paradigm in which two people can share mobile experiences and feel as if they are together, even though geographically distributed. In this dissertation, we posit that this paradigm has significant value in one-to-one, asymmetrical contexts, where one user can live-stream their experiences to another who remains at home. We discuss a review of the recent Interpersonal Telepresence literature, highlighting research trends and opportunities that require further examination. Specifically, we show how current telepresence prototypes do not meet the social needs of the streamer, who often feels socially awkward when using obtrusive devices. To combat this negative finding, we present a qualitative co-design study in which end users worked together to design their ideal telepresence systems, overcoming value tensions that naturally arise between Viewer and Streamer. Expectedly, virtual reality techniques are desired to provide immersive views of the remote location; however, our participants noted that the devices to facilitate this interaction need to be hidden from the public eye. This suggests that 360 cameras should be used, but the lenses need to be embedded in wearable systems, which might affect the viewing experience. We thus present two quantitative studies in which we examine the effects of camera placement and height on the viewing experience, in an effort to understand how we can better design telepresence systems. We found that camera height is not a significant factor, meaning wearable cameras do not need to be positioned at the natural eye-level of the viewer; the streamer is able to place them according to their own needs. Lastly, we present a qualitative study in which we deploy a custom interpersonal telepresence prototype on the co-design findings. Our participants preferred our prototype instead of simple video chat, even though it caused a somewhat increased sense of self-consciousness. Our participants indicated that they have their own preferences, even with simple design decisions such as style of hat, and we as a community need to consider ways to allow customization within our devices. Overall, our work contributes new knowledge to the telepresence field and helps system designers focus on the features that truly matter to users, in an effort to let people have richer experiences and virtually bridge the distance to their loved ones
A Cultural History of the Disneyland Theme Parks
The first comparative historical study of the six Disneyland theme parks around the world in five distinct cultures: the USA, Tokyo, Paris, Hong Kong and Shanghai. Situates the parks in their respective historic contexts at the time of their opening, and considers the part that class plays in the success or failure of these ventures
A Cultural History of the Disneyland Theme Parks
The first comparative historical study of the six Disneyland theme parks around the world in five distinct cultures: the USA, Tokyo, Paris, Hong Kong and Shanghai. Situates the parks in their respective historic contexts at the time of their opening, and considers the part that class plays in the success or failure of these ventures
UAV or Drones for Remote Sensing Applications in GPS/GNSS Enabled and GPS/GNSS Denied Environments
The design of novel UAV systems and the use of UAV platforms integrated with robotic sensing and imaging techniques, as well as the development of processing workflows and the capacity of ultra-high temporal and spatial resolution data, have enabled a rapid uptake of UAVs and drones across several industries and application domains.This book provides a forum for high-quality peer-reviewed papers that broaden awareness and understanding of single- and multiple-UAV developments for remote sensing applications, and associated developments in sensor technology, data processing and communications, and UAV system design and sensing capabilities in GPS-enabled and, more broadly, Global Navigation Satellite System (GNSS)-enabled and GPS/GNSS-denied environments.Contributions include:UAV-based photogrammetry, laser scanning, multispectral imaging, hyperspectral imaging, and thermal imaging;UAV sensor applications; spatial ecology; pest detection; reef; forestry; volcanology; precision agriculture wildlife species tracking; search and rescue; target tracking; atmosphere monitoring; chemical, biological, and natural disaster phenomena; fire prevention, flood prevention; volcanic monitoring; pollution monitoring; microclimates; and land use;Wildlife and target detection and recognition from UAV imagery using deep learning and machine learning techniques;UAV-based change detection
On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator
Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise