481 research outputs found

    Novel Methods and Algorithms for Presenting 3D Scenes

    Get PDF
    In recent years, improvements in the acquisition and creation of 3D models gave rise to an increasing availability of 3D content and to a widening of the audience such content is created for, which brought into focus the need for effective ways to visualize and interact with it. Until recently, the task of virtual inspection of a 3D object or navigation inside a 3D scene was carried out by using human machine interaction (HMI) metaphors controlled through mouse and keyboard events. However, this interaction approach may be cumbersome for the general audience. Furthermore, the inception and spread of touch-based mobile devices, such as smartphones and tablets, redefined the interaction problem entirely, since neither mouse nor keyboards are available anymore. The problem is made even worse by the fact that these devices are typically lower power if compared to desktop machines, while high-quality rendering is a computationally intensive task. In this thesis, we present a series of novel methods for the easy presentation of 3D content both when it is already available in a digitized form and when it must be acquired from the real world by image-based techniques. In the first case, we propose a method which takes as input the 3D scene of interest and an example video, and it automatically produces a video of the input scene that resembles the given video example. In other words, our algorithm allows the user to replicate an existing video, for example, a video created by a professional animator, on a different 3D scene. In the context of image-based techniques, exploiting the inherent spatial organization of photographs taken for the 3D reconstruction of a scene, we propose an intuitive interface for the smooth stereoscopic navigation of the acquired scene providing an immersive experience without the need of a complete 3D reconstruction. Finally, we propose an interactive framework for improving low-quality 3D reconstructions obtained through image-based reconstruction algorithms. Using few strokes on the input images, the user can specify high-level geometric hints to improve incomplete or noisy reconstructions which are caused by various quite common conditions often arising for objects such as buildings, streets and numerous other human-made functional elements

    Adaptation of Images and Videos for Different Screen Sizes

    Full text link
    With the increasing popularity of smartphones and similar mobile devices, the demand for media to consume on the go rises. As most images and videos today are captured with HD or even higher resolutions, there is a need to adapt them in a content-aware fashion before they can be watched comfortably on screens with small sizes and varying aspect ratios. This process is called retargeting. Most distortions during this process are caused by a change of the aspect ratio. Thus, retargeting mainly focuses on adapting the aspect ratio of a video while the rest can be scaled uniformly. The main objective of this dissertation is to contribute to the modern image and video retargeting, especially regarding the potential of the seam carving operator. There are still unsolved problems in this research field that should be addressed in order to improve the quality of the results or speed up the performance of the retargeting process. This dissertation presents novel algorithms that are able to retarget images, videos and stereoscopic videos while dealing with problems like the preservation of straight lines or the reduction of the required memory space and computation time. Additionally, a GPU implementation is used to achieve the retargeting of videos in real-time. Furthermore, an enhancement of face detection is presented which is able to distinguish between faces that are important for the retargeting and faces that are not. Results show that the developed techniques are suitable for the desired scenarios

    Selected topics in video coding and computer vision

    Get PDF
    Video applications ranging from multimedia communication to computer vision have been extensively studied in the past decades. However, the emergence of new applications continues to raise questions that are only partially answered by existing techniques. This thesis studies three selected topics related to video: intra prediction in block-based video coding, pedestrian detection and tracking in infrared imagery, and multi-view video alignment.;In the state-of-art video coding standard H.264/AVC, intra prediction is defined on the hierarchical quad-tree based block partitioning structure which fails to exploit the geometric constraint of edges. We propose a geometry-adaptive block partitioning structure and a new intra prediction algorithm named geometry-adaptive intra prediction (GAIP). A new texture prediction algorithm named geometry-adaptive intra displacement prediction (GAIDP) is also developed by extending the original intra displacement prediction (IDP) algorithm with the geometry-adaptive block partitions. Simulations on various test sequences demonstrate that intra coding performance of H.264/AVC can be significantly improved by incorporating the proposed geometry adaptive algorithms.;In recent years, due to the decreasing cost of thermal sensors, pedestrian detection and tracking in infrared imagery has become a topic of interest for night vision and all weather surveillance applications. We propose a novel approach for detecting and tracking pedestrians in infrared imagery based on a layered representation of infrared images. Pedestrians are detected from the foreground layer by a Principle Component Analysis (PCA) based scheme using the appearance cue. To facilitate the task of pedestrian tracking, we formulate the problem of shot segmentation and present a graph matching-based tracking algorithm. Simulations with both OSU Infrared Image Database and WVU Infrared Video Database are reported to demonstrate the accuracy and robustness of our algorithms.;Multi-view video alignment is a process to facilitate the fusion of non-synchronized multi-view video sequences for various applications including automatic video based surveillance and video metrology. In this thesis, we propose an accurate multi-view video alignment algorithm that iteratively aligns two sequences in space and time. To achieve an accurate sub-frame temporal alignment, we generalize the existing phase-correlation algorithm to 3-D case. We also present a novel method to obtain the ground-truth of the temporal alignment by using supplementary audio signals sampled at a much higher rate. The accuracy of our algorithm is verified by simulations using real-world sequences

    Motion and emotion : Semantic knowledge for hollywood film indexing

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF

    동적 장면으로부터의 다중 물체 3차원 복원 기법 및 학습 기반의 깊이 초해상도 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 이경무.In this dissertation, a framework for reconstructing 3-dimensional shape of the multiple objects and the method for enhancing the resolution of 3-dimensional models, especially human face, are proposed. Conventional 3D reconstruction from multiple views is applicable to static scenes, in which the configuration of objects is fixed while the images are taken. In the proposed framework, the main goal is to reconstruct the 3D models of multiple objects in a more general setting where the configuration of the objects varies among views. This problem is solved by object-centered decomposition of the dynamic scenes using unsupervised co-recognition approach. Unlike conventional motion segmentation algorithms that require small motion assumption between consecutive views, co-recognition method provides reliable accurate correspondences of a same object among unordered and wide-baseline views. In order to segment each object region, the 3D sparse points obtained from the structure-from-motion are utilized. These points are relative reliable since both their geometric relation and photometric consistency are considered simultaneously to generate these 3D sparse points. The sparse points serve as automatic seed points for a seeded-segmentation algorithm, which makes the interactive segmentation work in non-interactive way. Experiments on various real challenging image sequences demonstrate the effectiveness of the proposed approach, especially in the presence of abrupt independent motions of objects. Obtaining high-density 3D model is also an important issue. Since the multi-view images used to reconstruct 3D model or the 3D imaging hardware such as the time-of-flight cameras or the laser scanners have their own natural upper limit of resolution, super-resolution method is required to increase the resolution of 3D data. This dissertation presents an algorithm to super-resolve the single human face model represented in 3D point cloud. The point cloud data is considered as an object-centered 3D data representation compared to the camera-centered depth images. While many researches are done for the super-resolution of intensity images and there exist some prior works on the depth image data, this is the first attempt to super-resolve the single set of 3D point cloud data without additional intensity or depth image observation of the object. This problem is solved by querying the previously learned database which contains corresponding high resolution 3D data associated with the low resolution data. The Markov Random Field(MRF) model is constructed on the 3D points, and the proper energy function is formulated as a multi-class labeling problem on the MRF. Experimental results show that the proposed method solves the super-resolution problem with high accuracy.Abstract i Contents ii List of Figures vii List of Tables xiii 1 Introduction 1 1.1 3D Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Dissertation Goal and Contribution . . . . . . . . . . . . . . . . . . . 2 1.3 Organization of Dissertation . . . . . . . . . . . . . . . . . . . . . . . 3 2 Background 7 2.1 Motion Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Image Super Resolution . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Multi-Object Reconstruction from Dynamic Scenes 13 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.1 Co-Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4.2 Integration of the Sub-Results . . . . . . . . . . . . . . . . . 25 3.5 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6 Object Boundary Renement . . . . . . . . . . . . . . . . . . . . . . 28 3.7 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.8 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.8.1 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . 32 3.8.2 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . 39 3.8.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4 Super Resolution for 3D Face Reconstruction 55 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.4 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4.1 Local Patch . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.4.2 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.4.3 Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.5.1 Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.5.2 Building Markov Network . . . . . . . . . . . . . . . . . . . . 75 4.5.3 Reconstructing Super-Resolved 3D Model . . . . . . . . . . . 76 4.6 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.6.1 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . 78 4.6.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . 81 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5 Conclusion 93 5.1 Summary of Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Bibliography 97 국문 초록 107Docto

    A Pipeline for Modelling of Ice-Hockey Stick Shape Deformation Using Actual Shot Video

    Get PDF
    In Ice-Hockey, performance of the player’s shots depends on their skill level, body strength as well as stick’s construction and stiffness. In fact, research suggests that one of the primary reasons that the elite players generate much faster shots is their ability to flex their hockey stick. Thus, reconstructing the deformable 3D shape of the stick during the course of a player shot has important applications in performance analysis of ice-hockey stick. We present a new, low cost, portable system to acquire videos of a player shot and to automatically reconstruct the stick shape’s deformation in 3D. This thesis is a sub-part and contributes towards the ultimate goal of the pipeline in many different ways. First, designing a mobile stereovision setup and its calibration, capturing a lot of data acquisitions with different players shooting in different styles. Second, developing a two step pruning methodology to prune structurally thin and fast moving ice-hockey stick from noisy reconstructed point cloud in 3D. Third, automating the process of initial rigid alignment of the stick template in the noisy reconstruction. Forth, reducing the effect of noise by using medial axis approximation approach and suppressing the hand occlusion effect on the final template bending by a curve fitting approach. This pipeline is also robust against different ice-hockey sticks along with different players, shooting at different styles

    Motion Segmentation from Clustering of Sparse Point Features Using Spatially Constrained Mixture Models

    Get PDF
    Motion is one of the strongest cues available for segmentation. While motion segmentation finds wide ranging applications in object detection, tracking, surveillance, robotics, image and video compression, scene reconstruction, video editing, and so on, it faces various challenges such as accurate motion recovery from noisy data, varying complexity of the models required to describe the computed image motion, the dynamic nature of the scene that may include a large number of independently moving objects undergoing occlusions, and the need to make high-level decisions while dealing with long image sequences. Keeping the sparse point features as the pivotal point, this thesis presents three distinct approaches that address some of the above mentioned motion segmentation challenges. The first part deals with the detection and tracking of sparse point features in image sequences. A framework is proposed where point features can be tracked jointly. Traditionally, sparse features have been tracked independently of one another. Combining the ideas from Lucas-Kanade and Horn-Schunck, this thesis presents a technique in which the estimated motion of a feature is influenced by the motion of the neighboring features. The joint feature tracking algorithm leads to an improved tracking performance over the standard Lucas-Kanade based tracking approach, especially while tracking features in untextured regions. The second part is related to motion segmentation using sparse point feature trajectories. The approach utilizes a spatially constrained mixture model framework and a greedy EM algorithm to group point features. In contrast to previous work, the algorithm is incremental in nature and allows for an arbitrary number of objects traveling at different relative speeds to be segmented, thus eliminating the need for an explicit initialization of the number of groups. The primary parameter used by the algorithm is the amount of evidence that must be accumulated before the features are grouped. A statistical goodness-of-fit test monitors the change in the motion parameters of a group over time in order to automatically update the reference frame. The approach works in real time and is able to segment various challenging sequences captured from still and moving cameras that contain multiple independently moving objects and motion blur. The third part of this thesis deals with the use of specialized models for motion segmentation. The articulated human motion is chosen as a representative example that requires a complex model to be accurately described. A motion-based approach for segmentation, tracking, and pose estimation of articulated bodies is presented. The human body is represented using the trajectories of a number of sparse points. A novel motion descriptor encodes the spatial relationships of the motion vectors representing various parts of the person and can discriminate between articulated and non-articulated motions, as well as between various pose and view angles. Furthermore, a nearest neighbor search for the closest motion descriptor from the labeled training data consisting of the human gait cycle in multiple views is performed, and this distance is fed to a Hidden Markov Model defined over multiple poses and viewpoints to obtain temporally consistent pose estimates. Experimental results on various sequences of walking subjects with multiple viewpoints and scale demonstrate the effectiveness of the approach. In particular, the purely motion based approach is able to track people in night-time sequences, even when the appearance based cues are not available. Finally, an application of image segmentation is presented in the context of iris segmentation. Iris is a widely used biometric for recognition and is known to be highly accurate if the segmentation of the iris region is near perfect. Non-ideal situations arise when the iris undergoes occlusion by eyelashes or eyelids, or the overall quality of the segmented iris is affected by illumination changes, or due to out-of-plane rotation of the eye. The proposed iris segmentation approach combines the appearance and the geometry of the eye to segment iris regions from non-ideal images. The image is modeled as a Markov random field, and a graph cuts based energy minimization algorithm is applied to label the pixels either as eyelashes, pupil, iris, or background using texture and image intensity information. The iris shape is modeled as an ellipse and is used to refine the pixel based segmentation. The results indicate the effectiveness of the segmentation algorithm in handling non-ideal iris images

    Motion Segmentation Aided Super Resolution Image Reconstruction

    Get PDF
    This dissertation addresses Super Resolution (SR) Image Reconstruction focusing on motion segmentation. The main thrust is Information Complexity guided Gaussian Mixture Models (GMMs) for Statistical Background Modeling. In the process of developing our framework we also focus on two other topics; motion trajectories estimation toward global and local scene change detections and image reconstruction to have high resolution (HR) representations of the moving regions. Such a framework is used for dynamic scene understanding and recognition of individuals and threats with the help of the image sequences recorded with either stationary or non-stationary camera systems. We introduce a new technique called Information Complexity guided Statistical Background Modeling. Thus, we successfully employ GMMs, which are optimal with respect to information complexity criteria. Moving objects are segmented out through background subtraction which utilizes the computed background model. This technique produces superior results to competing background modeling strategies. The state-of-the-art SR Image Reconstruction studies combine the information from a set of unremarkably different low resolution (LR) images of static scene to construct an HR representation. The crucial challenge not handled in these studies is accumulating the corresponding information from highly displaced moving objects. In this aspect, a framework of SR Image Reconstruction of the moving objects with such high level of displacements is developed. Our assumption is that LR images are different from each other due to local motion of the objects and the global motion of the scene imposed by non-stationary imaging system. Contrary to traditional SR approaches, we employed several steps. These steps are; the suppression of the global motion, motion segmentation accompanied by background subtraction to extract moving objects, suppression of the local motion of the segmented out regions, and super-resolving accumulated information coming from moving objects rather than the whole scene. This results in a reliable offline SR Image Reconstruction tool which handles several types of dynamic scene changes, compensates the impacts of camera systems, and provides data redundancy through removing the background. The framework proved to be superior to the state-of-the-art algorithms which put no significant effort toward dynamic scene representation of non-stationary camera systems

    A Survey on Ear Biometrics

    No full text
    Recognizing people by their ear has recently received significant attention in the literature. Several reasons account for this trend: first, ear recognition does not suffer from some problems associated with other non contact biometrics, such as face recognition; second, it is the most promising candidate for combination with the face in the context of multi-pose face recognition; and third, the ear can be used for human recognition in surveillance videos where the face may be occluded completely or in part. Further, the ear appears to degrade little with age. Even though, current ear detection and recognition systems have reached a certain level of maturity, their success is limited to controlled indoor conditions. In addition to variation in illumination, other open research problems include hair occlusion; earprint forensics; ear symmetry; ear classification; and ear individuality. This paper provides a detailed survey of research conducted in ear detection and recognition. It provides an up-to-date review of the existing literature revealing the current state-of-art for not only those who are working in this area but also for those who might exploit this new approach. Furthermore, it offers insights into some unsolved ear recognition problems as well as ear databases available for researchers
    corecore