118 research outputs found

    An adversarial optimization approach to efficient outlier removal

    Get PDF
    This paper proposes a novel adversarial optimization approach to efficient outlier removal in computer vision. We characterize the outlier removal problem as a game that involves two players of conflicting interests, namely, optimizer and outlier. Such an adversarial view not only brings new insights into various existing methods, but also gives rise to a general optimization framework that provably unifies them. Under the proposed framework, we develop a new outlier removal approach that is able to offer a much needed control over the trade-off between reliability and speed, which is otherwise not available in previous methods. The proposed approach is driven by a mixed-integer minmax (convex-concave) optimization process. Although a minmax problem is generally not amenable to efficient optimization, we show that for some commonly used vision objective functions, an equivalent Linear Program reformulation exists. We demonstrate our method on two representative multiview geometry problems. Experiments on real image data illustrate superior practical performance of our method over recent techniques.Jin Yu, Anders Eriksson, Tat-Jun Chin, David Suterhttp://www.iccv2011.org

    Novel perspectives and approaches to video summarization

    Get PDF
    The increasing volume of videos requires efficient and effective techniques to index and structure videos. Video summarization is such a technique that extracts the essential information from a video, so that tasks such as comprehension by users and video content analysis can be conducted more effectively and efficiently. The research presented in this thesis investigates three novel perspectives of the video summarization problem and provides approaches to such perspectives. Our first perspective is to employ local keypoint to perform keyframe selection. Two criteria, namely Coverage and Redundancy, are introduced to guide the keyframe selection process in order to identify those representing maximum video content and sharing minimum redundancy. To efficiently deal with long videos, a top-down strategy is proposed, which splits the summarization problem to two sub-problems: scene identification and scene summarization. Our second perspective is to formulate the task of video summarization to the problem of sparse dictionary reconstruction. Our method utilizes the true sparse constraint L0 norm, instead of the relaxed constraint L2,1 norm, such that keyframes are directly selected as a sparse dictionary that can reconstruct the video frames. In addition, a Percentage Of Reconstruction (POR) criterion is proposed to intuitively guide users in selecting an appropriate length of the summary. In addition, an L2,0 constrained sparse dictionary selection model is also proposed to further verify the effectiveness of sparse dictionary reconstruction for video summarization. Lastly, we further investigate the multi-modal perspective of multimedia content summarization and enrichment. There are abundant images and videos on the Web, so it is highly desirable to effectively organize such resources for textual content enrichment. With the support of web scale images, our proposed system, namely StoryImaging, is capable of enriching arbitrary textual stories with visual content

    Towards Reliable and Accurate Global Structure-from-Motion

    Get PDF
    Reconstruction of objects or scenes from sparse point detections across multiple views is one of the most tackled problems in computer vision. Given the coordinates of 2D points tracked in multiple images, the problem consists of estimating the corresponding 3D points and cameras\u27 calibrations (intrinsic and pose), and can be solved by minimizing reprojection errors using bundle adjustment. However, given bundle adjustment\u27s nonlinear objective function and iterative nature, a good starting guess is required to converge to global minima. Global and Incremental Structure-from-Motion methods appear as ways to provide good initializations to bundle adjustment, each with different properties. While Global Structure-from-Motion has been shown to result in more accurate reconstructions compared to Incremental Structure-from-Motion, the latter has better scalability by starting with a small subset of images and sequentially adding new views, allowing reconstruction of sequences with millions of images. Additionally, both Global and Incremental Structure-from-Motion methods rely on accurate models of the scene or object, and under noisy conditions or high model uncertainty might result in poor initializations for bundle adjustment. Recently pOSE, a class of matrix factorization methods, has been proposed as an alternative to conventional Global SfM methods. These methods use VarPro - a second-order optimization method - to minimize a linear combination of an approximation of reprojection errors and a regularization term based on an affine camera model, and have been shown to converge to global minima with a high rate even when starting from random camera calibration estimations.This thesis aims at improving the reliability and accuracy of global SfM through different approaches. First, by studying conditions for global optimality of point set registration, a point cloud averaging method that can be used when (incomplete) 3D point clouds of the same scene in different coordinate systems are available. Second, by extending pOSE methods to different Structure-from-Motion problem instances, such as Non-Rigid SfM or radial distortion invariant SfM. Third and finally, by replacing the regularization term of pOSE methods with an exponential regularization on the projective depth of the 3D point estimations, resulting in a loss that achieves reconstructions with accuracy close to bundle adjustment

    Study of Computational Image Matching Techniques: Improving Our View of Biomedical Image Data

    Get PDF
    Image matching techniques are proven to be necessary in various fields of science and engineering, with many new methods and applications introduced over the years. In this PhD thesis, several computational image matching methods are introduced and investigated for improving the analysis of various biomedical image data. These improvements include the use of matching techniques for enhancing visualization of cross-sectional imaging modalities such as Computed Tomography (CT) and Magnetic Resonance Imaging (MRI), denoising of retinal Optical Coherence Tomography (OCT), and high quality 3D reconstruction of surfaces from Scanning Electron Microscope (SEM) images. This work greatly improves the process of data interpretation of image data with far reaching consequences for basic sciences research. The thesis starts with a general notion of the problem of image matching followed by an overview of the topics covered in the thesis. This is followed by introduction and investigation of several applications of image matching/registration in biomdecial image processing: a) registration-based slice interpolation, b) fast mesh-based deformable image registration and c) use of simultaneous rigid registration and Robust Principal Component Analysis (RPCA) for speckle noise reduction of retinal OCT images. Moving towards a different notion of image matching/correspondence, the problem of view synthesis and 3D reconstruction, with a focus on 3D reconstruction of microscopic samples from 2D images captured by SEM, is considered next. Starting from sparse feature-based matching techniques, an extensive analysis is provided for using several well-known feature detector/descriptor techniques, namely ORB, BRIEF, SURF and SIFT, for the problem of multi-view 3D reconstruction. This chapter contains qualitative and quantitative comparisons in order to reveal the shortcomings of the sparse feature-based techniques. This is followed by introduction of a novel framework using sparse-dense matching/correspondence for high quality 3D reconstruction of SEM images. As will be shown, the proposed framework results in better reconstructions when compared with state-of-the-art sparse-feature based techniques. Even though the proposed framework produces satisfactory results, there is room for improvements. These improvements become more necessary when dealing with higher complexity microscopic samples imaged by SEM as well as in cases with large displacements between corresponding points in micrographs. Therefore, based on the proposed framework, a new approach is proposed for high quality 3D reconstruction of microscopic samples. While in case of having simpler microscopic samples the performance of the two proposed techniques are comparable, the new technique results in more truthful reconstruction of highly complex samples. The thesis is concluded with an overview of the thesis and also pointers regarding future directions of the research using both multi-view and photometric techniques for 3D reconstruction of SEM images

    Cable Tension Monitoring using Non-Contact Vision-based Techniques

    Get PDF
    In cable-stayed bridges, the structural systems of tensioned cables play a critical role in structural and functional integrity. Thereby, tensile forces in the cables become one of the essential indicators in structural health monitoring (SHM). In this thesis, a video image processing technology integrated with cable dynamic analysis is proposed as a non-contact vision-based measurement technique, which provides a user-friendly, cost-effective, and computationally efficient solution to displacement extraction, frequency identification, and cable tension monitoring. In contrast to conventional contact sensors, the vision-based system is capable of taking remote measurements of cable dynamic response while having flexible sensing capability. Since cable detection is a substantial step in displacement extraction, a comprehensive study on the feasibility of the adopted feature detector is conducted under various testing scenarios. The performance of the feature detector is quantified by developing evaluation parameters. Enhancement methods for the feature detector in cable detection are investigated as well under complex testing environments. Threshold-dependent image matching approaches, which optimize the functionality of the feature-based video image processing technology, is proposed for noise-free and noisy background scenarios. The vision-based system is validated through experimental studies of free vibration tests on a single undamped cable in laboratory settings. The maximum percentage difference of the identified cable fundamental frequency is found to be 0.74% compared with accelerometer readings, while the maximum percentage difference of the estimated cable tensile force is 4.64% compared to direct measurement by a load cell

    Model-free Consensus Maximization for Non-Rigid Shapes

    Full text link
    Many computer vision methods use consensus maximization to relate measurements containing outliers with the correct transformation model. In the context of rigid shapes, this is typically done using Random Sampling and Consensus (RANSAC) by estimating an analytical model that agrees with the largest number of measurements (inliers). However, small parameter models may not be always available. In this paper, we formulate the model-free consensus maximization as an Integer Program in a graph using `rules' on measurements. We then provide a method to solve it optimally using the Branch and Bound (BnB) paradigm. We focus its application on non-rigid shapes, where we apply the method to remove outlier 3D correspondences and achieve performance superior to the state of the art. Our method works with outlier ratio as high as 80\%. We further derive a similar formulation for 3D template to image matching, achieving similar or better performance compared to the state of the art.Comment: ECCV1
    • …
    corecore