558,376 research outputs found

    Selection of Temporal Aligned Video Frames for Video Stitching Application

    Get PDF
    Multi-view image/video stitching algorithm is an extensive research area in computer vision and image based rendering. Most researches focus on stitching the images from different views with assumption that those images have been already aligned in temporal domain. However it is not the case in real application. If the images from different views are not aligned in temporal domain, or in another words, not time synchronized, the corresponding feature points or regions will not be located correctly among different views, which will result in ghost objects appearing in the final stitching/rendering result. In this paper, we present an epipolar geometry consistency scoring scheme to guide temporal aligned video frame pair selection for multi-view video stitching application. Essentially, the proposed scheme allows us to determine whether a given pair of video frames is temporally aligned well for video stitching. Experimental results confirm that better video stitching results can be obtained with the proposed scheme in place.published_or_final_versio

    View and depth preprocessing for view synthesis enhancement

    Get PDF
    In the paper, two preprocessing methods for virtual view synthesis are presented. In the first approach, both horizontal and vertical resolutions of the real views and the corresponding depth maps are doubled in order to perform view synthesis on images with densely arranged points. In the second method, real views are filtered in order to eliminate blurred or improperly shifted edges of the objects. Both methods are performed prior to synthesis, thus they may be applied to different Depth-Image-Based Rendering algorithms. In the paper, for both proposed methods, the achieved quality gains are presented

    On the use of uavs in mining and archaeology - geo-accurate 3d reconstructions using various platforms and terrestrial views

    Get PDF
    During the last decades photogrammetric computer vision systems have been well established in scientific and commercial applications. Especially the increasing affordability of unmanned aerial vehicles (UAVs) in conjunction with automated multi-view processing pipelines have resulted in an easy way of acquiring spatial data and creating realistic and accurate 3D models. With the use of multicopter UAVs, it is possible to record highly overlapping images from almost terrestrial camera positions to oblique and nadir aerial images due to the ability to navigate slowly, hover and capture images at nearly any possible position. Multi-copter UAVs thus are bridging the gap between terrestrial and traditional aerial image acquisition and are therefore ideally suited to enable easy and safe data collection and inspection tasks in complex or hazardous environments. In this paper we present a fully automated processing pipeline for precise, metric and geo-accurate 3D reconstructions of complex geometries using various imaging platforms. Our workflow allows for georeferencing of UAV imagery based on GPS-measurements of camera stations from an on-board GPS receiver as well as tie and control point information. Ground control points (GCPs) are integrated directly in the bundle adjustment to refine the georegistration and correct for systematic distortions of the image block. We discuss our approach based on three different case studies for applications in mining and archaeology and present several accuracy related analyses investigating georegistration, camera network configuration and ground sampling distance. Our approach is furthermore suited for seamlessly matching and integrating images from different view points and cameras (aerial and terrestrial as well as inside views) into one single reconstruction. Together with aerial images from a UAV, we are able to enrich 3D models by combining terrestrial images as well inside views of an object by joint image processing to generate highly detailed, accurate and complete reconstructions

    Panoramic UAV Views for Landscape Heritage Analysis Integrated with Historical Maps Atlases

    Get PDF
    Analysis of landscape heritage and territorial transformations dedicated to its protection and preservation rely increasingly upon the contribution of integrated disciplines. In 2000 the European Landscape Convention established the necessity 'to integrate landscape into its regional and town planning policies and in its cultural, environmental, agricultural, social and economic policies'. Such articulated territorial dimension requires an approach able to consider multi-dimensional data and information from different spatial and temporal series, supporting territorial analysis and spatial planning under different points of view. Most of landscape representation instruments are based on 3D models based on top-down image/views, with still weak possibilities to reproduce views similar to the human eye or map surface development along preferential directions (e.g. water front views). A methodological approach of rediscovering the long tradition of historical water front view maps, itinerary maps and human eye maps perspective, could improve content decoding of cultural heritage with environmental dimension and its knowledge transfer to planners and citizens. The research here described experiments multiple view models which can simulate real scenarios at the height of observer or along view front. The paper investigates the possibilities of panoramic views simulation and reconstruction from images acquired by RC/UAV platforms and multisensory systems, testing orthoimage generation for landscape riparian areas and water front wiew representation, verifying the application of automatic algorithms for image orientation and DTM extraction (AtiPE, ATE) on such complex image models, identifying critical aspects for future development. The sample landscape portion along ancient water corridor, with stratified values of anthropogenic environment, shows the potentials of future achievement in supporting sustainable planning through technical water front view map and 3D panoramic views, for Environmental Impact Assessment (EIA) purposes and for the improvement of an acknowledged tourism within geo-atlas based on multi-dimensional and multitemporal Spatial Data Infrastructures (SDI)

    Modelos y representaciones visuales en la ciencia

    Get PDF
    An outstanding feature of modern science is their use of images. Their increasing use, both in scientific research processes and in communication media, contrasts with the short attention that Philosophy of Science has paid to them. The great amount of critics to the received view have left virtually untouched an assessment on the scientific images which continues to be present even in that historiography of science most away from positivistic views. In the latest years, however, some authors from very different research fields have been starting to show an increasing interest in scientific images, opening so new ways of analysis of their production and their function in scientific knowledge. In this article, I suggest an approach to the question that does not try to get closed conclusions or to establish a general interpretative thesis applicable to all types of scientific images. My target is to show some of the elements that have hindered the approach to the images in the philosophy of science in the XXth century, in order to move in a second place to analyze the usefulness and weaknesses of some philosophical alternatives for the comprehension of non verbal representation in science. At this point, an interesting approach proceeds from some versions of the semantic view which, highlighting the non linguistic nature of scientific models, points out the possibility of interpreting images as representational models. Even if I realize the value of some interpretative keys from these semantic views to study the non verbal representation in science, I try to show here how they suffer from certain weaknesses arising from basically two points: a too general and vague notions of scientific image and similarity. A classification of scientific images by their functions, their diagrammatic or naturalistic form, the visibility or invisibility of the object or phenomenon they represent is a necessary condition to begin a research about their actual making and their use in scientific practice. Meanwhile, this diversity reveals the plurality of uses of the concept of similarity, so raising once more one of the most classical questions of the theories of scientific representation

    A Neural Height-Map Approach for the Binocular Photometric Stereo Problem

    Full text link
    In this work we propose a novel, highly practical, binocular photometric stereo (PS) framework, which has same acquisition speed as single view PS, however significantly improves the quality of the estimated geometry. As in recent neural multi-view shape estimation frameworks such as NeRF, SIREN and inverse graphics approaches to multi-view photometric stereo (e.g. PS-NeRF) we formulate shape estimation task as learning of a differentiable surface and texture representation by minimising surface normal discrepancy for normals estimated from multiple varying light images for two views as well as discrepancy between rendered surface intensity and observed images. Our method differs from typical multi-view shape estimation approaches in two key ways. First, our surface is represented not as a volume but as a neural heightmap where heights of points on a surface are computed by a deep neural network. Second, instead of predicting an average intensity as PS-NeRF or introducing lambertian material assumptions as Guo et al., we use a learnt BRDF and perform near-field per point intensity rendering. Our method achieves the state-of-the-art performance on the DiLiGenT-MV dataset adapted to binocular stereo setup as well as a new binocular photometric stereo dataset - LUCES-ST.Comment: WACV 202

    Image Based View Synthesis

    Get PDF
    This dissertation deals with the image-based approach to synthesize a virtual scene using sparse images or a video sequence without the use of 3D models. In our scenario, a real dynamic or static scene is captured by a set of un-calibrated images from different viewpoints. After automatically recovering the geometric transformations between these images, a series of photo-realistic virtual views can be rendered and a virtual environment covered by these several static cameras can be synthesized. This image-based approach has applications in object recognition, object transfer, video synthesis and video compression. In this dissertation, I have contributed to several sub-problems related to image based view synthesis. Before image-based view synthesis can be performed, images need to be segmented into individual objects. Assuming that a scene can approximately be described by multiple planar regions, I have developed a robust and novel approach to automatically extract a set of affine or projective transformations induced by these regions, correctly detect the occlusion pixels over multiple consecutive frames, and accurately segment the scene into several motion layers. First, a number of seed regions using correspondences in two frames are determined, and the seed regions are expanded and outliers are rejected employing the graph cuts method integrated with level set representation. Next, these initial regions are merged into several initial layers according to the motion similarity. Third, the occlusion order constraints on multiple frames are explored, which guarantee that the occlusion area increases with the temporal order in a short period and effectively maintains segmentation consistency over multiple consecutive frames. Then the correct layer segmentation is obtained by using a graph cuts algorithm, and the occlusions between the overlapping layers are explicitly determined. Several experimental results are demonstrated to show that our approach is effective and robust. Recovering the geometrical transformations among images of a scene is a prerequisite step for image-based view synthesis. I have developed a wide baseline matching algorithm to identify the correspondences between two un-calibrated images, and to further determine the geometric relationship between images, such as epipolar geometry or projective transformation. In our approach, a set of salient features, edge-corners, are detected to provide robust and consistent matching primitives. Then, based on the Singular Value Decomposition (SVD) of an affine matrix, we effectively quantize the search space into two independent subspaces for rotation angle and scaling factor, and then we use a two-stage affine matching algorithm to obtain robust matches between these two frames. The experimental results on a number of wide baseline images strongly demonstrate that our matching method outperforms the state-of-art algorithms even under the significant camera motion, illumination variation, occlusion, and self-similarity. Given the wide baseline matches among images I have developed a novel method for Dynamic view morphing. Dynamic view morphing deals with the scenes containing moving objects in presence of camera motion. The objects can be rigid or non-rigid, each of them can move in any orientation or direction. The proposed method can generate a series of continuous and physically accurate intermediate views from only two reference images without any knowledge about 3D. The procedure consists of three steps: segmentation, morphing and post-warping. Given a boundary connection constraint, the source and target scenes are segmented into several layers for morphing. Based on the decomposition of affine transformation between corresponding points, we uniquely determine a physically correct path for post-warping by the least distortion method. I have successfully generalized the dynamic scene synthesis problem from the simple scene with only rotation to the dynamic scene containing non-rigid objects. My method can handle dynamic rigid or non-rigid objects, including complicated objects such as humans. Finally, I have also developed a novel algorithm for tri-view morphing. This is an efficient image-based method to navigate a scene based on only three wide-baseline un-calibrated images without the explicit use of a 3D model. After automatically recovering corresponding points between each pair of images using our wide baseline matching method, an accurate trifocal plane is extracted from the trifocal tensor implied in these three images. Next, employing a trinocular-stereo algorithm and barycentric blending technique, we generate an arbitrary novel view to navigate the scene in a 2D space. Furthermore, after self-calibration of the cameras, a 3D model can also be correctly augmented into this virtual environment synthesized by the tri-view morphing algorithm. We have applied our view morphing framework to several interesting applications: 4D video synthesis, automatic target recognition, multi-view morphing

    UMFuse: Unified Multi View Fusion for Human Editing applications

    Full text link
    Numerous pose-guided human editing methods have been explored by the vision community due to their extensive practical applications. However, most of these methods still use an image-to-image formulation in which a single image is given as input to produce an edited image as output. This objective becomes ill-defined in cases when the target pose differs significantly from the input pose. Existing methods then resort to in-painting or style transfer to handle occlusions and preserve content. In this paper, we explore the utilization of multiple views to minimize the issue of missing information and generate an accurate representation of the underlying human model. To fuse knowledge from multiple viewpoints, we design a multi-view fusion network that takes the pose key points and texture from multiple source images and generates an explainable per-pixel appearance retrieval map. Thereafter, the encodings from a separate network (trained on a single-view human reposing task) are merged in the latent space. This enables us to generate accurate, precise, and visually coherent images for different editing tasks. We show the application of our network on two newly proposed tasks - Multi-view human reposing and Mix&Match Human Image generation. Additionally, we study the limitations of single-view editing and scenarios in which multi-view provides a better alternative.Comment: 8 pages, 6 figure
    corecore