25 research outputs found

    RecRecNet: Rectangling Rectified Wide-Angle Images by Thin-Plate Spline Model and DoF-based Curriculum Learning

    Full text link
    The wide-angle lens shows appealing applications in VR technologies, but it introduces severe radial distortion into its captured image. To recover the realistic scene, previous works devote to rectifying the content of the wide-angle image. However, such a rectification solution inevitably distorts the image boundary, which changes related geometric distributions and misleads the current vision perception models. In this work, we explore constructing a win-win representation on both content and boundary by contributing a new learning model, i.e., Rectangling Rectification Network (RecRecNet). In particular, we propose a thin-plate spline (TPS) module to formulate the non-linear and non-rigid transformation for rectangling images. By learning the control points on the rectified image, our model can flexibly warp the source structure to the target domain and achieves an end-to-end unsupervised deformation. To relieve the complexity of structure approximation, we then inspire our RecRecNet to learn the gradual deformation rules with a DoF (Degree of Freedom)-based curriculum learning. By increasing the DoF in each curriculum stage, namely, from similarity transformation (4-DoF) to homography transformation (8-DoF), the network is capable of investigating more detailed deformations, offering fast convergence on the final rectangling task. Experiments show the superiority of our solution over the compared methods on both quantitative and qualitative evaluations. The code and dataset are available at https://github.com/KangLiao929/RecRecNet.Comment: Accepted to ICCV 202

    Automatic Detection of Calibration Grids in Time-of-Flight Images

    Get PDF
    It is convenient to calibrate time-of-flight cameras by established methods, using images of a chequerboard pattern. The low resolution of the amplitude image, however, makes it difficult to detect the board reliably. Heuristic detection methods, based on connected image-components, perform very poorly on this data. An alternative, geometrically-principled method is introduced here, based on the Hough transform. The projection of a chequerboard is represented by two pencils of lines, which are identified as oriented clusters in the gradient-data of the image. A projective Hough transform is applied to each of the two clusters, in axis-aligned coordinates. The range of each transform is properly bounded, because the corresponding gradient vectors are approximately parallel. Each of the two transforms contains a series of collinear peaks; one for every line in the given pencil. This pattern is easily detected, by sweeping a dual line through the transform. The proposed Hough-based method is compared to the standard OpenCV detection routine, by application to several hundred time-of-flight images. It is shown that the new method detects significantly more calibration boards, over a greater variety of poses, without any overall loss of accuracy. This conclusion is based on an analysis of both geometric and photometric error.Comment: 11 pages, 11 figures, 1 tabl

    DarSwin: Distortion Aware Radial Swin Transformer

    Full text link
    Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. We leverage the physical characteristics of such lenses, which are analytically defined by the radial distortion profile (assumed to be known), to develop a distortion aware radial swin transformer (DarSwin). In contrast to conventional transformer-based architectures, DarSwin comprises a radial patch partitioning, a distortion-based sampling technique for creating token embeddings, and a polar position encoding for radial patch merging. We validate our method on classification tasks using synthetically distorted ImageNet data and show through extensive experiments that DarSwin can perform zero-shot adaptation to unseen distortions of different wide-angle lenses. Compared to other baselines, DarSwin achieves the best results (in terms of Top-1 and -5 accuracy), when tested on in-distribution data, with almost 2% (6%) gain in Top-1 accuracy under medium (high) distortion levels, and comparable to the state-of-the-art under low and very low distortion levels (perspective-like images).Comment: 8 pages, 8 figure

    Geometric Inference with Microlens Arrays

    Get PDF
    This dissertation explores an alternative to traditional fiducial markers where geometric information is inferred from the observed position of 3D points seen in an image. We offer an alternative approach which enables geometric inference based on the relative orientation of markers in an image. We present markers fabricated from microlenses whose appearance changes depending on the marker\u27s orientation relative to the camera. First, we show how to manufacture and calibrate chromo-coding lenticular arrays to create a known relationship between the observed hue and orientation of the array. Second, we use 2 small chromo-coding lenticular arrays to estimate the pose of an object. Third, we use 3 large chromo-coding lenticular arrays to calibrate a camera with a single image. Finally, we create another type of fiducial marker from lenslet arrays that encode orientation with discrete black and white appearances. Collectively, these approaches oer new opportunities for pose estimation and camera calibration that are relevant for robotics, virtual reality, and augmented reality

    Algorithms for trajectory integration in multiple views

    Get PDF
    PhDThis thesis addresses the problem of deriving a coherent and accurate localization of moving objects from partial visual information when data are generated by cameras placed in di erent view angles with respect to the scene. The framework is built around applications of scene monitoring with multiple cameras. Firstly, we demonstrate how a geometric-based solution exploits the relationships between corresponding feature points across views and improves accuracy in object location. Then, we improve the estimation of objects location with geometric transformations that account for lens distortions. Additionally, we study the integration of the partial visual information generated by each individual sensor and their combination into one single frame of observation that considers object association and data fusion. Our approach is fully image-based, only relies on 2D constructs and does not require any complex computation in 3D space. We exploit the continuity and coherence in objects' motion when crossing cameras' elds of view. Additionally, we work under the assumption of planar ground plane and wide baseline (i.e. cameras' viewpoints are far apart). The main contributions are: i) the development of a framework for distributed visual sensing that accounts for inaccuracies in the geometry of multiple views; ii) the reduction of trajectory mapping errors using a statistical-based homography estimation; iii) the integration of a polynomial method for correcting inaccuracies caused by the cameras' lens distortion; iv) a global trajectory reconstruction algorithm that associates and integrates fragments of trajectories generated by each camera

    Deep learning applied to 2D video data for the estimation of clamp reaction forces acting on running prosthetic feet and experimental validation after bench and track tests

    Get PDF
    Carbon fiber Running Specific Prostheses (RSPs) have allowed athletes with lower extremity amputations to recover their functional capability of running. RSPs are designed to replicate the spring-like nature of biological legs: they are passive components that mimic the tendons elastic potential energy storage and release during ground contact. The knowledge of loads acting on the prosthesis is crucial for evaluating athletes’ running technique, prevent injuries and designing Running Prosthetic Feet (RPF). The aim of the present work is to investigate a method to estimate forces acting on a RPF based on its geometrical configuration. Firstly, the use of kinematic data acquired with 2D videos was assessed, to understand if they can be a good approximation to the golden standard represented by motion capture (MOCAP). This was done by evaluating steps acquired during two running sessions (OS1 and OS3) with elite paralympic athletes. Then, the problem was formulated using a deep learning approach, training a neural network over data collected from in vitro bench tests, carried out on a hydraulic test bench. Two models were built: the first one was trained over data from standard procedures and validated on two steps of OS1; then, in order to improve the performance of the prototype, a second model was built and trained with data from newly studied procedures. It was then validated on three steps from OS3.Carbon fiber Running Specific Prostheses (RSPs) have allowed athletes with lower extremity amputations to recover their functional capability of running. RSPs are designed to replicate the spring-like nature of biological legs: they are passive components that mimic the tendons elastic potential energy storage and release during ground contact. The knowledge of loads acting on the prosthesis is crucial for evaluating athletes’ running technique, prevent injuries and designing Running Prosthetic Feet (RPF). The aim of the present work is to investigate a method to estimate forces acting on a RPF based on its geometrical configuration. Firstly, the use of kinematic data acquired with 2D videos was assessed, to understand if they can be a good approximation to the golden standard represented by motion capture (MOCAP). This was done by evaluating steps acquired during two running sessions (OS1 and OS3) with elite paralympic athletes. Then, the problem was formulated using a deep learning approach, training a neural network over data collected from in vitro bench tests, carried out on a hydraulic test bench. Two models were built: the first one was trained over data from standard procedures and validated on two steps of OS1; then, in order to improve the performance of the prototype, a second model was built and trained with data from newly studied procedures. It was then validated on three steps from OS3

    Audio and visual perceptions for mobile robot

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore