243 research outputs found
Image stitching algorithm based on feature extraction
This paper proposes a novel edge-based stitching method to detect moving objects and construct\ud
mosaics from images. The method is a coarse-to-fine scheme which first estimates a\ud
good initialization of camera parameters with two complementary methods and then refines\ud
the solution through an optimization process. The two complementary methods are the edge\ud
alignment and correspondence-based approaches, respectively. The edge alignment method\ud
estimates desired image translations by checking the consistencies of edge positions between\ud
images. This method has better capabilities to overcome larger displacements and lighting variations\ud
between images. The correspondence-based approach estimates desired parameters from\ud
a set of correspondences by using a new feature extraction scheme and a new correspondence\ud
building method. The method can solve more general camera motions than the edge alignment\ud
method. Since these two methods are complementary to each other, the desired initial estimate\ud
can be obtained more robustly. After that, a Monte-Carlo style method is then proposed for\ud
integrating these two methods together. In this approach, a grid partition scheme is proposed to\ud
increase the accuracy of each try for finding the correct parameters. After that, an optimization\ud
process is then applied to refine the above initial parameters. Different from other optimization\ud
methods minimizing errors on the whole images, the proposed scheme minimizes errors only on\ud
positions of features points. Since the found initialization is very close to the exact solution and\ud
only errors on feature positions are considered, the optimization process can be achieved very\ud
quickly. Experimental results are provided to verify the superiority of the proposed method
TwinTex: Geometry-aware Texture Generation for Abstracted 3D Architectural Models
Coarse architectural models are often generated at scales ranging from
individual buildings to scenes for downstream applications such as Digital Twin
City, Metaverse, LODs, etc. Such piece-wise planar models can be abstracted as
twins from 3D dense reconstructions. However, these models typically lack
realistic texture relative to the real building or scene, making them
unsuitable for vivid display or direct reference. In this paper, we present
TwinTex, the first automatic texture mapping framework to generate a
photo-realistic texture for a piece-wise planar proxy. Our method addresses
most challenges occurring in such twin texture generation. Specifically, for
each primitive plane, we first select a small set of photos with greedy
heuristics considering photometric quality, perspective quality and facade
texture completeness. Then, different levels of line features (LoLs) are
extracted from the set of selected photos to generate guidance for later steps.
With LoLs, we employ optimization algorithms to align texture with geometry
from local to global. Finally, we fine-tune a diffusion model with a multi-mask
initialization component and a new dataset to inpaint the missing region.
Experimental results on many buildings, indoor scenes and man-made objects of
varying complexity demonstrate the generalization ability of our algorithm. Our
approach surpasses state-of-the-art texture mapping methods in terms of
high-fidelity quality and reaches a human-expert production level with much
less effort. Project page: https://vcc.tech/research/2023/TwinTex.Comment: Accepted to SIGGRAPH ASIA 202
Architectural Digital Photogrammetry
This study is to exploit texturing techniques of a common modelling software in the way of creating virtual models of an exist architectures using oriented panoramas. In this research, The panoramic image-based interactive modelling is introduced as assembly point of photography, topography, photogrammetry and modelling techniques. It is an interactive system for generating photorealistic, textured 3D models of architectural structures and urban scenes.
The technique is suitable for the architectural survey because it is not a «point by point» survey, and it exploit the geometrical constraints in the architecture to simplify modelling.
Many factors are presented to be critical features that affect the modelling quality and accuracy, such as the way and the position in shooting the photos, stitching the multi-image panorama photos, the orientation, texturing techniques and so on.
During the last few years, many Image-based modelling programmes have been released. Whereas, in this research, the photo modelling programs was not in use, it meant to face the fundamentals of the photogrammetry and to go beyond the limitations of such software by avoiding the automatism. In addition, it meant to exploit the potent commands of a program as 3DsMax to obtain the final representation of the Architecture. Such representation can be used in different fields (from detailed architectural survey to an architectural representation in cinema and video games), considering the accuracy and the quality which they are vary too.
After the theoretical studies of this technique, it was applied in four applications to different types of close range surveys. This practice allowed to comprehend the practical problems in the whole process (from photographing all the way to modelling) and to propose the methods in the ways to improve it and to avoid any complications. It was compared with the laser scanning to study the accuracy of this technique.
Thus, it is realized that not only the accuracy of this technique is linked to the size of the surveyed object, but also the size changes the way in which the survey to be approached.
Since the 3D modelling program is not dedicated to be used for the image-based modelling, texturing problems was faced. It was analyzed in: how the program can behave with the Bitmap, how to project it, how it could be an interactive projection, and what are the limitations
Vide-omics : a genomics-inspired paradigm for video analysis
With the development of applications associated to ego-vision systems, smart-phones, and autonomous cars, automated analysis of videos generated by freely moving cameras has become a major challenge for the computer vision community. Current techniques are still not suitable to deal with real-life situations due to, in particular, wide scene variability and the large range of camera motions. Whereas most approaches attempt to control those parameters, this paper introduces a novel video analysis paradigm, 'vide-omics', inspired by the principles of genomics where variability is the expected norm. Validation of this new concept is performed by designing an implementation addressing foreground extraction from videos captured by freely moving cameras. Evaluation on a set of standard videos demonstrates both robust performance that is largely independent from camera motion and scene, and state-of-the-art results in the most challenging video. Those experiments underline not only the validity of the 'vide-omics' paradigm, but also its potential
Augmented Reality and Artificial Intelligence in Image-Guided and Robot-Assisted Interventions
In minimally invasive orthopedic procedures, the surgeon places wires, screws, and surgical implants through the muscles and bony structures under image guidance. These interventions require alignment of the pre- and intra-operative patient data, the intra-operative scanner, surgical instruments, and the patient. Suboptimal interaction with patient data and challenges in mastering 3D anatomy based on ill-posed 2D interventional images are essential concerns in image-guided therapies.
State of the art approaches often support the surgeon by using external navigation systems or ill-conditioned image-based registration methods that both have certain drawbacks. Augmented reality (AR) has been introduced in the operating rooms in the last decade; however, in image-guided interventions, it has often only been considered as a visualization device improving traditional workflows. Consequently, the technology is gaining minimum maturity that it requires to redefine new procedures, user interfaces, and interactions.
This dissertation investigates the applications of AR, artificial intelligence, and robotics in interventional medicine. Our solutions were applied in a broad spectrum of problems for various tasks, namely improving imaging and acquisition, image computing and analytics for registration and image understanding, and enhancing the interventional visualization. The benefits of these approaches were also discovered in robot-assisted interventions.
We revealed how exemplary workflows are redefined via AR by taking full advantage of head-mounted displays when entirely co-registered with the imaging systems and the environment at all times. The proposed AR landscape is enabled by co-localizing the users and the imaging devices via the operating room environment and exploiting all involved frustums to move spatial information between different bodies. The system's awareness of the geometric and physical characteristics of X-ray imaging allows the exploration of different human-machine interfaces. We also leveraged the principles governing image formation and combined it with deep learning and RGBD sensing to fuse images and reconstruct interventional data.
We hope that our holistic approaches towards improving the interface of surgery and enhancing the usability of interventional imaging, not only augments the surgeon's capabilities but also augments the surgical team's experience in carrying out an effective intervention with reduced complications
Recommended from our members
Camera positioning for 3D panoramic image rendering
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Virtual camera realisation and the proposition of trapezoidal camera architecture are the two broad contributions of this thesis. Firstly, multiple camera and their arrangement constitute a critical component which affect the integrity of visual content acquisition for multi-view video. Currently, linear, convergence, and divergence arrays are the prominent camera topologies adopted. However, the large number of cameras required and their synchronisation are two of prominent challenges usually encountered. The use of virtual cameras can significantly reduce the number of physical cameras used with respect to any of the known
camera structures, hence adequately reducing some of the other implementation issues. This thesis explores to use image-based rendering with and without geometry in the implementations leading to the realisation of virtual cameras. The virtual camera implementation was carried out from the perspective of depth map (geometry) and use of multiple image samples (no geometry). Prior to the virtual camera realisation, the generation of depth map was investigated using region match measures widely known for solving image point correspondence problem. The constructed depth maps have been compare with the ones generated
using the dynamic programming approach. In both the geometry and no geometry approaches, the virtual cameras lead to the rendering of views from a textured depth map, construction of 3D panoramic image of a scene by stitching multiple image samples and performing superposition on them, and computation
of virtual scene from a stereo pair of panoramic images. The quality of these rendered images were assessed through the use of either objective or subjective analysis in Imatest software. Further more, metric reconstruction of a scene was performed by re-projection of the pixel points from multiple image samples with
a single centre of projection. This was done using sparse bundle adjustment algorithm. The statistical summary obtained after the application of this algorithm provides a gauge for the efficiency of the optimisation step. The optimised data was then visualised in Meshlab software environment, hence providing the reconstructed scene. Secondly, with any of the well-established camera arrangements, all cameras are usually constrained to the same horizontal plane. Therefore, occlusion becomes an extremely challenging problem, and a robust camera set-up is required in order to resolve strongly the hidden part of any scene objects.
To adequately meet the visibility condition for scene objects and given that occlusion of the same scene objects can occur, a multi-plane camera structure is highly desirable. Therefore, this thesis also explore trapezoidal camera structure for image acquisition. The approach here is to assess the feasibility and potential
of several physical cameras of the same model being sparsely arranged on the edge of an efficient trapezoid graph. This is implemented both Matlab and Maya. The quality of the depth maps rendered in Matlab are better in Quality
Fehlerkaschierte Bildbasierte Darstellungsverfahren
Creating photo-realistic images has been one of the major goals in computer graphics since its early days. Instead of modeling the complexity of nature with standard modeling tools, image-based approaches aim at exploiting real-world footage directly,as they are photo-realistic by definition. A drawback of these approaches has always been that the composition or combination of different sources is a non-trivial task, often resulting in annoying visible artifacts. In this thesis we focus on different techniques to diminish visible artifacts when combining multiple images in a common image domain. The results are either novel images, when dealing with the composition task of multiple images, or novel video sequences rendered in real-time, when dealing with video footage from multiple cameras.Fotorealismus ist seit jeher eines der großen Ziele in der Computergrafik. Anstatt die Komplexität der Natur mit standardisierten Modellierungswerkzeugen nachzubauen, gehen bildbasierte Ansätze den umgekehrten Weg und verwenden reale Bildaufnahmen zur Modellierung, da diese bereits per Definition fotorealistisch sind. Ein Nachteil dieser Variante ist jedoch, dass die Komposition oder Kombination mehrerer Quellbilder eine nichttriviale Aufgabe darstellt und häufig unangenehm auffallende Artefakte im erzeugten Bild nach sich zieht. In dieser Dissertation werden verschiedene Ansätze verfolgt, um Artefakte zu verhindern oder abzuschwächen, welche durch die Komposition oder Kombination mehrerer Bilder in einer gemeinsamen Bilddomäne entstehen. Im Ergebnis liefern die vorgestellten Verfahren
neue Bilder oder neue Ansichten einer Bildsammlung oder Videosequenz, je nachdem, ob die jeweilige Aufgabe die Komposition mehrerer Bilder ist oder die Kombination mehrerer Videos verschiedener Kameras darstellt
텍스트와 특징점 기반의 목적함수 최적화를 이용한 문서와 텍스트 평활화 기법
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 8. 조남익.There are many techniques and applications that detect and recognize text information in the images, e.g., document retrieval using the camera-captured document image, book reader for visually impaired, and augmented reality based on text recognition. In these applications, the planar surfaces which contain the text are often distorted in the captured image due to the perspective view (e.g., road signs), curvature (e.g., unfolded books), and wrinkles (e.g., old documents). Specifically, recovering the original document texture by removing these distortions from the camera-captured document images is called the document rectification. In this dissertation, new text surface rectification algorithms are proposed, for improving text recognition accuracy and visual quality. The proposed methods are categorized into 3 types depending on the types of the input. The contributions of the proposed methods can be summarized as follows.
In the first rectification algorithm, the dense text-lines in the documents are employed to rectify the images. Unlike the conventional approaches, the proposed method does not directly use the text-line. Instead, the proposed method use the discrete representation of text-lines and text-blocks which are the sets of connected components. Also, the geometric distortion caused by page curl and perspective view are modeled as generalized cylindrical surfaces and camera rotation respectively. With these distortion model and discrete representation of the features, a cost function whose minimization yields parameters of the distortion model is developed. In the cost function, the properties of the pages such as text-block alignment, line-spacing, and the straightness of text-lines are encoded. By describing the text features using the sets of discrete points, the cost function can be easily defined and well solved by Levenberg-Marquadt algorithm. Experiments show that the proposed method works well for the various layouts and curved surfaces, and compares favorably with the conventional methods on the standard dataset.
The second algorithm is a unified framework to rectify and stitch multiple document images using visual feature points instead of text lines. This is similar to the method employed in general image stitching algorithm. However, the general image stitching algorithm usually assumes fixed center of camera, which is not taken for granted in capturing the document. To deal with the camera motion between images, a new parametric family of motion model is proposed in this dissertation. Besides, to remove the ambiguity in the reference plane, a new cost function is developed to impose the constraints on the reference plane. This enables the estimation of physically correct reference plane without prior knowledge. The estimated reference plane can also be used to rectify the stitching result. Furthermore, the proposed method can be applied to any other planar object such as building facades or mural paintings as well as the camera-captured document image since it employs the general features.
The third rectification method is based on scene text detection algorithm, which is independent from the language model. The conventional methods assume that a character consists of a single connected component (CC) like English alphabet. However, this assumption is brittle in the Asian characters such as Korean, Chinese, and Japanese, where a single character consists of several CCs. Therefore, it is difficult to divide CCs into text lines without language model. To alleviate this problem, the proposed method clusters the candidate regions based on the similarity measure considering inter-character relation. The adjacency measure is trained on the data set labeled with the bounding box of text region. Non-text regions that remain after clustering are filtered out in text/non-text classification step. Final text regions are merged or divided into each text line considering the orientation and location. The detected text is rectified using the orientation of text-line and vertical strokes. The proposed method outperforms state-of-the-art algorithms in English as well as Asian characters in the extensive experiments.1 Introduction 1
1.1 Document rectification via text-line based optimization . . . . . . . 2
1.2 A unified approach of rectification and stitching for document images 4
1.3 Rectification via scene text detection . . . . . . . . . . . . . . . . . . 5
1.4 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Related work 9
2.1 Document rectification . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Document dewarping without text-lines . . . . . . . . . . . . 9
2.1.2 Document dewarping with text-lines . . . . . . . . . . . . . . 10
2.1.3 Text-block identification and text-line extraction . . . . . . . 11
2.2 Document stitching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Scene text detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Document rectification based on text-lines 15
3.1 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Image acquisition model . . . . . . . . . . . . . . . . . . . . . 16
3.1.2 Proposed approach to document dewarping . . . . . . . . . . 18
3.2 Proposed cost function and its optimization . . . . . . . . . . . . . . 22
3.2.1 Design of Estr(·) . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Minimization of Estr(·) . . . . . . . . . . . . . . . . . . . . . 23
3.2.3 Alignment type classification . . . . . . . . . . . . . . . . . . 28
3.2.4 Design of Ealign(·) . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.5 Design of Espacing(·) . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Extension to unfolded book surfaces . . . . . . . . . . . . . . . . . . 32
3.4 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4.1 Experiments on synthetic data . . . . . . . . . . . . . . . . . 36
3.4.2 Experiments on real images . . . . . . . . . . . . . . . . . . . 39
3.4.3 Comparison with existing methods . . . . . . . . . . . . . . . 43
3.4.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4 Document rectification based on feature detection 49
4.1 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Proposed cost function and its optimization . . . . . . . . . . . . . . 51
4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.2 Homography between the i-th image and E . . . . . . . . . 52
4.2.3 Proposed cost function . . . . . . . . . . . . . . . . . . . . . . 53
4.2.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.5 Relation to the model in [17] . . . . . . . . . . . . . . . . . . 55
4.3 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1 Classification of two cases . . . . . . . . . . . . . . . . . . . . 56
4.3.2 Skew removal . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4.1 Quantitative evaluation on metric reconstruction performance 57
4.4.2 Experiments on real images . . . . . . . . . . . . . . . . . . . 58
5 Scene text detection and rectification 67
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1.2 Proposed approach . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2 Candidate region detection . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.1 CC extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.2 Computation of similarity between CCs . . . . . . . . . . . . 70
5.2.3 CC clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3 Rectification of candidate region . . . . . . . . . . . . . . . . . . . . 73
5.4 Text/non-text classification . . . . . . . . . . . . . . . . . . . . . . . 76
5.5 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.5.1 Experimental results on ICDAR 2011 dataset . . . . . . . . . 80
5.5.2 Experimental results on the Asian character dataset . . . . . 80
6 Conclusion 83
Bibliography 87
Abstract (Korean) 97Docto
- …