106 research outputs found
Enhancing Perception and Immersion in Pre-Captured Environments through Learning-Based Eye Height Adaptation
Pre-captured immersive environments using omnidirectional cameras provide a
wide range of virtual reality applications. Previous research has shown that
manipulating the eye height in egocentric virtual environments can
significantly affect distance perception and immersion. However, the influence
of eye height in pre-captured real environments has received less attention due
to the difficulty of altering the perspective after finishing the capture
process. To explore this influence, we first propose a pilot study that
captures real environments with multiple eye heights and asks participants to
judge the egocentric distances and immersion. If a significant influence is
confirmed, an effective image-based approach to adapt pre-captured real-world
environments to the user's eye height would be desirable. Motivated by the
study, we propose a learning-based approach for synthesizing novel views for
omnidirectional images with altered eye heights. This approach employs a
multitask architecture that learns depth and semantic segmentation in two
formats, and generates high-quality depth and semantic segmentation to
facilitate the inpainting stage. With the improved omnidirectional-aware
layered depth image, our approach synthesizes natural and realistic visuals for
eye height adaptation. Quantitative and qualitative evaluation shows favorable
results against state-of-the-art methods, and an extensive user study verifies
improved perception and immersion for pre-captured real-world environments.Comment: 10 pages, 13 figures, 3 tables, submitted to ISMAR 202
3D Reconstruction of Sculptures from Single Images via Unsupervised Domain Adaptation on Implicit Models
Acquiring the virtual equivalent of exhibits, such as sculptures, in virtual
reality (VR) museums, can be labour-intensive and sometimes infeasible. Deep
learning based 3D reconstruction approaches allow us to recover 3D shapes from
2D observations, among which single-view-based approaches can reduce the need
for human intervention and specialised equipment in acquiring 3D sculptures for
VR museums. However, there exist two challenges when attempting to use the
well-researched human reconstruction methods: limited data availability and
domain shift. Considering sculptures are usually related to humans, we propose
our unsupervised 3D domain adaptation method for adapting a single-view 3D
implicit reconstruction model from the source (real-world humans) to the target
(sculptures) domain. We have compared the generated shapes with other methods
and conducted ablation studies as well as a user study to demonstrate the
effectiveness of our adaptation method. We also deploy our results in a VR
application
Enhancing surgical performance in cardiothoracic surgery with innovations from computer vision and artificial intelligence: a narrative review
When technical requirements are high, and patient outcomes are critical, opportunities for monitoring and improving surgical skills via objective motion analysis feedback may be particularly beneficial. This narrative review synthesises work on technical and non-technical surgical skills, collaborative task performance, and pose estimation to illustrate new opportunities to advance cardiothoracic surgical performance with innovations from computer vision and artificial intelligence. These technological innovations are critically evaluated in terms of the benefits they could offer the cardiothoracic surgical community, and any barriers to the uptake of the technology are elaborated upon. Like some other specialities, cardiothoracic surgery has relatively few opportunities to benefit from tools with data capture technology embedded within them (as is possible with robotic-assisted laparoscopic surgery, for example). In such cases, pose estimation techniques that allow for movement tracking across a conventional operating field without using specialist equipment or markers offer considerable potential. With video data from either simulated or real surgical procedures, these tools can (1) provide insight into the development of expertise and surgical performance over a surgeon’s career, (2) provide feedback to trainee surgeons regarding areas for improvement, (3) provide the opportunity to investigate what aspects of skill may be linked to patient outcomes which can (4) inform the aspects of surgical skill which should be focused on within training or mentoring programmes. Classifier or assessment algorithms that use artificial intelligence to ‘learn’ what expertise is from expert surgical evaluators could further assist educators in determining if trainees meet competency thresholds. With collaborative efforts between surgical teams, medical institutions, computer scientists and researchers to ensure this technology is developed with usability and ethics in mind, the developed feedback tools could improve cardiothoracic surgical practice in a data-driven way
On the Design Fundamentals of Diffusion Models: A Survey
Diffusion models are generative models, which gradually add and remove noise
to learn the underlying distribution of training data for data generation. The
components of diffusion models have gained significant attention with many
design choices proposed. Existing reviews have primarily focused on
higher-level solutions, thereby covering less on the design fundamentals of
components. This study seeks to address this gap by providing a comprehensive
and coherent review on component-wise design choices in diffusion models.
Specifically, we organize this review according to their three key components,
namely the forward process, the reverse process, and the sampling procedure.
This allows us to provide a fine-grained perspective of diffusion models,
benefiting future studies in the analysis of individual components, the
applicability of design choices, and the implementation of diffusion models
HINT: High-quality INpainting Transformer with Mask-Aware Encoding and Enhanced Attention
Existing image inpainting methods leverage convolution-based downsampling approaches to reduce spatial dimensions. This may result in information loss from corrupted images where the available information is inherently sparse, especially for the scenario of large missing regions. Recent advances in self-attention mechanisms within transformers have led to significant improvements in many computer vision tasks including inpainting. However, limited by the computational costs, existing methods cannot fully exploit the efficacy of long-range modelling capabilities of such models. In this paper, we propose an end-to-end High-quality INpainting Transformer, abbreviated as HINT, which consists of a novel mask-aware pixel-shuffle downsampling module (MPD) to preserve the visible information extracted from the corrupted image while maintaining the integrity of the information available for highlevel inferences made within the model. Moreover, we propose a Spatially-activated Channel Attention Layer (SCAL), an efficient self-attention mechanism interpreting spatial awareness to model the corrupted image at multiple scales. To further enhance the effectiveness of SCAL, motivated by recent advanced in speech recognition, we introduce a sandwich structure that places feed-forward networks before and after the SCAL module. We demonstrate the superior performance of HINT compared to contemporary state-of-the-art models on four datasets, CelebA, CelebA-HQ, Places2, and Dunhuang
U3DS: Unsupervised 3D Semantic Scene Segmentation
Contemporary point cloud segmentation approaches largely rely on richly
annotated 3D training data. However, it is both time-consuming and challenging
to obtain consistently accurate annotations for such 3D scene data. Moreover,
there is still a lack of investigation into fully unsupervised scene
segmentation for point clouds, especially for holistic 3D scenes. This paper
presents U3DS, as a step towards completely unsupervised point cloud
segmentation for any holistic 3D scenes. To achieve this, U3DS leverages a
generalized unsupervised segmentation method for both object and background
across both indoor and outdoor static 3D point clouds with no requirement for
model pre-training, by leveraging only the inherent information of the point
cloud to achieve full 3D scene segmentation. The initial step of our proposed
approach involves generating superpoints based on the geometric characteristics
of each scene. Subsequently, it undergoes a learning process through a spatial
clustering-based methodology, followed by iterative training using
pseudo-labels generated in accordance with the cluster centroids. Moreover, by
leveraging the invariance and equivariance of the volumetric representations,
we apply the geometric transformation on voxelized features to provide two sets
of descriptors for robust representation learning. Finally, our evaluation
provides state-of-the-art results on the ScanNet and SemanticKITTI, and
competitive results on the S3DIS, benchmark datasets.Comment: 10 Pages, 4 figures, accepted to IEEE/CVF Winter Conference on
Applications of Computer Vision (WACV) 202
- …