25 research outputs found
RecRecNet: Rectangling Rectified Wide-Angle Images by Thin-Plate Spline Model and DoF-based Curriculum Learning
The wide-angle lens shows appealing applications in VR technologies, but it
introduces severe radial distortion into its captured image. To recover the
realistic scene, previous works devote to rectifying the content of the
wide-angle image. However, such a rectification solution inevitably distorts
the image boundary, which changes related geometric distributions and misleads
the current vision perception models. In this work, we explore constructing a
win-win representation on both content and boundary by contributing a new
learning model, i.e., Rectangling Rectification Network (RecRecNet). In
particular, we propose a thin-plate spline (TPS) module to formulate the
non-linear and non-rigid transformation for rectangling images. By learning the
control points on the rectified image, our model can flexibly warp the source
structure to the target domain and achieves an end-to-end unsupervised
deformation. To relieve the complexity of structure approximation, we then
inspire our RecRecNet to learn the gradual deformation rules with a DoF (Degree
of Freedom)-based curriculum learning. By increasing the DoF in each curriculum
stage, namely, from similarity transformation (4-DoF) to homography
transformation (8-DoF), the network is capable of investigating more detailed
deformations, offering fast convergence on the final rectangling task.
Experiments show the superiority of our solution over the compared methods on
both quantitative and qualitative evaluations. The code and dataset are
available at https://github.com/KangLiao929/RecRecNet.Comment: Accepted to ICCV 202
Automatic Detection of Calibration Grids in Time-of-Flight Images
It is convenient to calibrate time-of-flight cameras by established methods,
using images of a chequerboard pattern. The low resolution of the amplitude
image, however, makes it difficult to detect the board reliably. Heuristic
detection methods, based on connected image-components, perform very poorly on
this data. An alternative, geometrically-principled method is introduced here,
based on the Hough transform. The projection of a chequerboard is represented
by two pencils of lines, which are identified as oriented clusters in the
gradient-data of the image. A projective Hough transform is applied to each of
the two clusters, in axis-aligned coordinates. The range of each transform is
properly bounded, because the corresponding gradient vectors are approximately
parallel. Each of the two transforms contains a series of collinear peaks; one
for every line in the given pencil. This pattern is easily detected, by
sweeping a dual line through the transform. The proposed Hough-based method is
compared to the standard OpenCV detection routine, by application to several
hundred time-of-flight images. It is shown that the new method detects
significantly more calibration boards, over a greater variety of poses, without
any overall loss of accuracy. This conclusion is based on an analysis of both
geometric and photometric error.Comment: 11 pages, 11 figures, 1 tabl
DarSwin: Distortion Aware Radial Swin Transformer
Wide-angle lenses are commonly used in perception tasks requiring a large
field of view. Unfortunately, these lenses produce significant distortions
making conventional models that ignore the distortion effects unable to adapt
to wide-angle images. In this paper, we present a novel transformer-based model
that automatically adapts to the distortion produced by wide-angle lenses. We
leverage the physical characteristics of such lenses, which are analytically
defined by the radial distortion profile (assumed to be known), to develop a
distortion aware radial swin transformer (DarSwin). In contrast to conventional
transformer-based architectures, DarSwin comprises a radial patch partitioning,
a distortion-based sampling technique for creating token embeddings, and a
polar position encoding for radial patch merging. We validate our method on
classification tasks using synthetically distorted ImageNet data and show
through extensive experiments that DarSwin can perform zero-shot adaptation to
unseen distortions of different wide-angle lenses. Compared to other baselines,
DarSwin achieves the best results (in terms of Top-1 and -5 accuracy), when
tested on in-distribution data, with almost 2% (6%) gain in Top-1 accuracy
under medium (high) distortion levels, and comparable to the state-of-the-art
under low and very low distortion levels (perspective-like images).Comment: 8 pages, 8 figure
Geometric Inference with Microlens Arrays
This dissertation explores an alternative to traditional fiducial markers where geometric
information is inferred from the observed position of 3D points seen in an image. We offer an alternative approach which enables geometric inference based on the relative orientation
of markers in an image. We present markers fabricated from microlenses whose appearance
changes depending on the marker\u27s orientation relative to the camera. First, we show how
to manufacture and calibrate chromo-coding lenticular arrays to create a known relationship
between the observed hue and orientation of the array. Second, we use 2 small chromo-coding lenticular arrays to estimate the pose of an object. Third, we use 3 large chromo-coding lenticular arrays to calibrate a camera with a single image. Finally, we create another type of fiducial marker from lenslet arrays that encode orientation with discrete black and white appearances. Collectively, these approaches oer new opportunities for pose estimation and camera calibration that are relevant for robotics, virtual reality, and augmented reality
Algorithms for trajectory integration in multiple views
PhDThis thesis addresses the problem of deriving a coherent and accurate localization
of moving objects from partial visual information when data are generated by cameras
placed in di erent view angles with respect to the scene. The framework is built around
applications of scene monitoring with multiple cameras. Firstly, we demonstrate how a
geometric-based solution exploits the relationships between corresponding feature points
across views and improves accuracy in object location. Then, we improve the estimation
of objects location with geometric transformations that account for lens distortions.
Additionally, we study the integration of the partial visual information generated by each
individual sensor and their combination into one single frame of observation that considers
object association and data fusion. Our approach is fully image-based, only relies on 2D
constructs and does not require any complex computation in 3D space. We exploit the
continuity and coherence in objects' motion when crossing cameras' elds of view. Additionally,
we work under the assumption of planar ground plane and wide baseline (i.e.
cameras' viewpoints are far apart). The main contributions are: i) the development of a
framework for distributed visual sensing that accounts for inaccuracies in the geometry
of multiple views; ii) the reduction of trajectory mapping errors using a statistical-based
homography estimation; iii) the integration of a polynomial method for correcting inaccuracies
caused by the cameras' lens distortion; iv) a global trajectory reconstruction
algorithm that associates and integrates fragments of trajectories generated by each camera
Deep learning applied to 2D video data for the estimation of clamp reaction forces acting on running prosthetic feet and experimental validation after bench and track tests
Carbon fiber Running Specific Prostheses (RSPs) have allowed athletes with lower extremity amputations to recover their functional capability of running. RSPs are designed to replicate the spring-like nature of biological legs: they are passive components that mimic the tendons elastic potential energy storage and release during ground contact.
The knowledge of loads acting on the prosthesis is crucial for evaluating athletes’ running technique, prevent injuries and designing Running Prosthetic Feet (RPF).
The aim of the present work is to investigate a method to estimate forces acting on a RPF based on its geometrical configuration. Firstly, the use of kinematic data acquired with 2D videos was assessed, to understand if they can be a good approximation to the golden standard represented by motion capture (MOCAP). This was done by evaluating steps acquired during two running sessions (OS1 and OS3) with elite paralympic athletes. Then, the problem was formulated using a deep learning approach, training a neural network over data collected from in vitro bench tests, carried out on a hydraulic test bench. Two models were built: the first one was trained over data from standard procedures and validated on two steps of OS1; then, in order to improve the performance of the prototype, a second model was built and trained with data from newly studied procedures. It was then validated on three steps from OS3.Carbon fiber Running Specific Prostheses (RSPs) have allowed athletes with lower extremity amputations to recover their functional capability of running. RSPs are designed to replicate the spring-like nature of biological legs: they are passive components that mimic the tendons elastic potential energy storage and release during ground contact.
The knowledge of loads acting on the prosthesis is crucial for evaluating athletes’ running technique, prevent injuries and designing Running Prosthetic Feet (RPF).
The aim of the present work is to investigate a method to estimate forces acting on a RPF based on its geometrical configuration. Firstly, the use of kinematic data acquired with 2D videos was assessed, to understand if they can be a good approximation to the golden standard represented by motion capture (MOCAP). This was done by evaluating steps acquired during two running sessions (OS1 and OS3) with elite paralympic athletes. Then, the problem was formulated using a deep learning approach, training a neural network over data collected from in vitro bench tests, carried out on a hydraulic test bench. Two models were built: the first one was trained over data from standard procedures and validated on two steps of OS1; then, in order to improve the performance of the prototype, a second model was built and trained with data from newly studied procedures. It was then validated on three steps from OS3