9,760 research outputs found
Satellite Image Based Cross-view Localization for Autonomous Vehicle
Existing spatial localization techniques for autonomous vehicles mostly use a
pre-built 3D-HD map, often constructed using a survey-grade 3D mapping vehicle,
which is not only expensive but also laborious. This paper shows that by using
an off-the-shelf high-definition satellite image as a ready-to-use map, we are
able to achieve cross-view vehicle localization up to a satisfactory accuracy,
providing a cheaper and more practical way for localization. While the
utilization of satellite imagery for cross-view localization is an established
concept, the conventional methodology focuses primarily on image retrieval.
This paper introduces a novel approach to cross-view localization that departs
from the conventional image retrieval method. Specifically, our method develops
(1) a Geometric-align Feature Extractor (GaFE) that leverages measured 3D
points to bridge the geometric gap between ground and overhead views, (2) a
Pose Aware Branch (PAB) adopting a triplet loss to encourage pose-aware feature
extraction, and (3) a Recursive Pose Refine Branch (RPRB) using the
Levenberg-Marquardt (LM) algorithm to align the initial pose towards the true
vehicle pose iteratively. Our method is validated on KITTI and Ford Multi-AV
Seasonal datasets as ground view and Google Maps as the satellite view. The
results demonstrate the superiority of our method in cross-view localization
with median spatial and angular errors within meter and ,
respectively.Comment: Accepted by ICRA202
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
Sensitivity analysis for ReaxFF reparameterization using the Hilbert-Schmidt independence criterion
We apply a global sensitivity method, the Hilbert-Schmidt independence
criterion (HSIC), to the reparameterization of a Zn/S/H ReaxFF force field to
identify the most appropriate parameters for reparameterization. Parameter
selection remains a challenge in this context as high dimensional optimizations
are prone to overfitting and take a long time, but selecting too few parameters
leads to poor quality force fields. We show that the HSIC correctly and quickly
identifies the most sensitive parameters, and that optimizations done using a
small number of sensitive parameters outperform those done using a higher
dimensional reasonable-user parameter selection. Optimizations using only
sensitive parameters: 1) converge faster, 2) have loss values comparable to
those found with the naive selection, 3) have similar accuracy in validation
tests, and 4) do not suffer from problems of overfitting. We demonstrate that
an HSIC global sensitivity is a cheap optimization pre-processing step that has
both qualitative and quantitative benefits which can substantially simplify and
speedup ReaxFF reparameterizations.Comment: author accepted manuscrip
Offline and Online Models for Learning Pairwise Relations in Data
Pairwise relations between data points are essential for numerous machine learning algorithms. Many representation learning methods consider pairwise relations to identify the latent features and patterns in the data. This thesis, investigates learning of pairwise relations from two different perspectives: offline learning and online learning.The first part of the thesis focuses on offline learning by starting with an investigation of the performance modeling of a synchronization method in concurrent programming using a Markov chain whose state transition matrix models pairwise relations between involved cores in a computer process.Then the thesis focuses on a particular pairwise distance measure, the minimax distance, and explores memory-efficient approaches to computing this distance by proposing a hierarchical representation of the data with a linear memory requirement with respect to the number of data points, from which the exact pairwise minimax distances can be derived in a memory-efficient manner. Then, a memory-efficient sampling method is proposed that follows the aforementioned hierarchical representation of the data and samples the data points in a way that the minimax distances between all data points are maximally preserved. Finally, the thesis proposes a practical non-parametric clustering of vehicle motion trajectories to annotate traffic scenarios based on transitive relations between trajectories in an embedded space.The second part of the thesis takes an online learning perspective, and starts by presenting an online learning method for identifying bottlenecks in a road network by extracting the minimax path, where bottlenecks are considered as road segments with the highest cost, e.g., in the sense of travel time. Inspired by real-world road networks, the thesis assumes a stochastic traffic environment in which the road-specific probability distribution of travel time is unknown. Therefore, it needs to learn the parameters of the probability distribution through observations by modeling the bottleneck identification task as a combinatorial semi-bandit problem. The proposed approach takes into account the prior knowledge and follows a Bayesian approach to update the parameters. Moreover, it develops a combinatorial variant of Thompson Sampling and derives an upper bound for the corresponding Bayesian regret. Furthermore, the thesis proposes an approximate algorithm to address the respective computational intractability issue.Finally, the thesis considers contextual information of road network segments by extending the proposed model to a contextual combinatorial semi-bandit framework and investigates and develops various algorithms for this contextual combinatorial setting
Loop Closure Detection Based on Object-level Spatial Layout and Semantic Consistency
Visual simultaneous localization and mapping (SLAM) systems face challenges
in detecting loop closure under the circumstance of large viewpoint changes. In
this paper, we present an object-based loop closure detection method based on
the spatial layout and semanic consistency of the 3D scene graph. Firstly, we
propose an object-level data association approach based on the semantic
information from semantic labels, intersection over union (IoU), object color,
and object embedding. Subsequently, multi-view bundle adjustment with the
associated objects is utilized to jointly optimize the poses of objects and
cameras. We represent the refined objects as a 3D spatial graph with semantics
and topology. Then, we propose a graph matching approach to select
correspondence objects based on the structure layout and semantic property
similarity of vertices' neighbors. Finally, we jointly optimize camera
trajectories and object poses in an object-level pose graph optimization, which
results in a globally consistent map. Experimental results demonstrate that our
proposed data association approach can construct more accurate 3D semantic
maps, and our loop closure method is more robust than point-based and
object-based methods in circumstances with large viewpoint changes
Monocular 3D Human Pose Estimation for Sports Broadcasts using Partial Sports Field Registration
The filming of sporting events projects and flattens the movement of athletes
in the world onto a 2D broadcast image. The pixel locations of joints in these
images can be detected with high validity. Recovering the actual 3D movement of
the limbs (kinematics) of the athletes requires lifting these 2D pixel
locations back into a third dimension, implying a certain scene geometry. The
well-known line markings of sports fields allow for the calibration of the
camera and for determining the actual geometry of the scene. Close-up shots of
athletes are required to extract detailed kinematics, which in turn obfuscates
the pertinent field markers for camera calibration. We suggest partial sports
field registration, which determines a set of scene-consistent camera
calibrations up to a single degree of freedom. Through joint optimization of 3D
pose estimation and camera calibration, we demonstrate the successful
extraction of 3D running kinematics on a 400m track. In this work, we combine
advances in 2D human pose estimation and camera calibration via partial sports
field registration to demonstrate an avenue for collecting valid large-scale
kinematic datasets. We generate a synthetic dataset of more than 10k images in
Unreal Engine 5 with different viewpoints, running styles, and body types, to
show the limitations of existing monocular 3D HPE methods. Synthetic data and
code are available at https://github.com/tobibaum/PartialSportsFieldReg_3DHPE.Comment: accept at "9th International Workshop on Computer Vision in Sports
(CVsports) at CVPR 2023
VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs
We introduce VIVE3D, a novel approach that extends the capabilities of
image-based 3D GANs to video editing and is able to represent the input video
in an identity-preserving and temporally consistent way. We propose two new
building blocks. First, we introduce a novel GAN inversion technique
specifically tailored to 3D GANs by jointly embedding multiple frames and
optimizing for the camera parameters. Second, besides traditional semantic face
edits (e.g. for age and expression), we are the first to demonstrate edits that
show novel views of the head enabled by the inherent properties of 3D GANs and
our optical flow-guided compositing technique to combine the head with the
background video. Our experiments demonstrate that VIVE3D generates
high-fidelity face edits at consistent quality from a range of camera
viewpoints which are composited with the original video in a temporally and
spatially consistent manner.Comment: CVPR 2023. Project webpage and video available at
http://afruehstueck.github.io/vive3
mSPD-NN: A Geometrically Aware Neural Framework for Biomarker Discovery from Functional Connectomics Manifolds
Connectomics has emerged as a powerful tool in neuroimaging and has spurred
recent advancements in statistical and machine learning methods for
connectivity data. Despite connectomes inhabiting a matrix manifold, most
analytical frameworks ignore the underlying data geometry. This is largely
because simple operations, such as mean estimation, do not have easily
computable closed-form solutions. We propose a geometrically aware neural
framework for connectomes, i.e., the mSPD-NN, designed to estimate the geodesic
mean of a collections of symmetric positive definite (SPD) matrices. The
mSPD-NN is comprised of bilinear fully connected layers with tied weights and
utilizes a novel loss function to optimize the matrix-normal equation arising
from Fr\'echet mean estimation. Via experiments on synthetic data, we
demonstrate the efficacy of our mSPD-NN against common alternatives for SPD
mean estimation, providing competitive performance in terms of scalability and
robustness to noise. We illustrate the real-world flexibility of the mSPD-NN in
multiple experiments on rs-fMRI data and demonstrate that it uncovers stable
biomarkers associated with subtle network differences among patients with
ADHD-ASD comorbidities and healthy controls.Comment: Accepted into IPMI 202
Hi4D: 4D Instance Segmentation of Close Human Interaction
We propose Hi4D, a method and dataset for the automatic analysis of
physically close human-human interaction under prolonged contact. Robustly
disentangling several in-contact subjects is a challenging task due to
occlusions and complex shapes. Hence, existing multi-view systems typically
fuse 3D surfaces of close subjects into a single, connected mesh. To address
this issue we leverage i) individually fitted neural implicit avatars; ii) an
alternating optimization scheme that refines pose and surface through periods
of close proximity; and iii) thus segment the fused raw scans into individual
instances. From these instances we compile Hi4D dataset of 4D textured scans of
20 subject pairs, 100 sequences, and a total of more than 11K frames. Hi4D
contains rich interaction-centric annotations in 2D and 3D alongside accurately
registered parametric body models. We define varied human pose and shape
estimation tasks on this dataset and provide results from state-of-the-art
methods on these benchmarks.Comment: Project page: https://yifeiyin04.github.io/Hi4D
DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets
Robotic grasping of 3D deformable objects is critical for real-world
applications such as food handling and robotic surgery. Unlike rigid and
articulated objects, 3D deformable objects have infinite degrees of freedom.
Fully defining their state requires 3D deformation and stress fields, which are
exceptionally difficult to analytically compute or experimentally measure.
Thus, evaluating grasp candidates for grasp planning typically requires
accurate, but slow 3D finite element method (FEM) simulation. Sampling-based
grasp planning is often impractical, as it requires evaluation of a large
number of grasp candidates. Gradient-based grasp planning can be more
efficient, but requires a differentiable model to synthesize optimal grasps
from initial candidates. Differentiable FEM simulators may fill this role, but
are typically no faster than standard FEM. In this work, we propose learning a
predictive graph neural network (GNN), DefGraspNets, to act as our
differentiable model. We train DefGraspNets to predict 3D stress and
deformation fields based on FEM-based grasp simulations. DefGraspNets not only
runs up to 1500 times faster than the FEM simulator, but also enables fast
gradient-based grasp optimization over 3D stress and deformation metrics. We
design DefGraspNets to align with real-world grasp planning practices and
demonstrate generalization across multiple test sets, including real-world
experiments.Comment: To be published in the IEEE Conference on Robotics and Automation
(ICRA), 202
- …