199 research outputs found

    Video modeling via implicit motion representations

    Get PDF
    Video modeling refers to the development of analytical representations for explaining the intensity distribution in video signals. Based on the analytical representation, we can develop algorithms for accomplishing particular video-related tasks. Therefore video modeling provides us a foundation to bridge video data and related-tasks. Although there are many video models proposed in the past decades, the rise of new applications calls for more efficient and accurate video modeling approaches.;Most existing video modeling approaches are based on explicit motion representations, where motion information is explicitly expressed by correspondence-based representations (i.e., motion velocity or displacement). Although it is conceptually simple, the limitations of those representations and the suboptimum of motion estimation techniques can degrade such video modeling approaches, especially for handling complex motion or non-ideal observation video data. In this thesis, we propose to investigate video modeling without explicit motion representation. Motion information is implicitly embedded into the spatio-temporal dependency among pixels or patches instead of being explicitly described by motion vectors.;Firstly, we propose a parametric model based on a spatio-temporal adaptive localized learning (STALL). We formulate video modeling as a linear regression problem, in which motion information is embedded within the regression coefficients. The coefficients are adaptively learned within a local space-time window based on LMMSE criterion. Incorporating a spatio-temporal resampling and a Bayesian fusion scheme, we can enhance the modeling capability of STALL on more general videos. Under the framework of STALL, we can develop video processing algorithms for a variety of applications by adjusting model parameters (i.e., the size and topology of model support and training window). We apply STALL on three video processing problems. The simulation results show that motion information can be efficiently exploited by our implicit motion representation and the resampling and fusion do help to enhance the modeling capability of STALL.;Secondly, we propose a nonparametric video modeling approach, which is not dependent on explicit motion estimation. Assuming the video sequence is composed of many overlapping space-time patches, we propose to embed motion-related information into the relationships among video patches and develop a generic sparsity-based prior for typical video sequences. First, we extend block matching to more general kNN-based patch clustering, which provides an implicit and distributed representation for motion information. We propose to enforce the sparsity constraint on a higher-dimensional data array signal, which is generated by packing the patches in the similar patch set. Then we solve the inference problem by updating the kNN array and the wanted signal iteratively. Finally, we present a Bayesian fusion approach to fuse multiple-hypothesis inferences. Simulation results in video error concealment, denoising, and deartifacting are reported to demonstrate its modeling capability.;Finally, we summarize the proposed two video modeling approaches. We also point out the perspectives of implicit motion representations in applications ranging from low to high level problems

    Flood Forecasting Using Machine Learning Methods

    Get PDF
    This book is a printed edition of the Special Issue Flood Forecasting Using Machine Learning Methods that was published in Wate

    Deep Video Compression

    Get PDF

    Texture Structure Analysis

    Get PDF
    abstract: Texture analysis plays an important role in applications like automated pattern inspection, image and video compression, content-based image retrieval, remote-sensing, medical imaging and document processing, to name a few. Texture Structure Analysis is the process of studying the structure present in the textures. This structure can be expressed in terms of perceived regularity. Our human visual system (HVS) uses the perceived regularity as one of the important pre-attentive cues in low-level image understanding. Similar to the HVS, image processing and computer vision systems can make fast and efficient decisions if they can quantify this regularity automatically. In this work, the problem of quantifying the degree of perceived regularity when looking at an arbitrary texture is introduced and addressed. One key contribution of this work is in proposing an objective no-reference perceptual texture regularity metric based on visual saliency. Other key contributions include an adaptive texture synthesis method based on texture regularity, and a low-complexity reduced-reference visual quality metric for assessing the quality of synthesized textures. In order to use the best performing visual attention model on textures, the performance of the most popular visual attention models to predict the visual saliency on textures is evaluated. Since there is no publicly available database with ground-truth saliency maps on images with exclusive texture content, a new eye-tracking database is systematically built. Using the Visual Saliency Map (VSM) generated by the best visual attention model, the proposed texture regularity metric is computed. The proposed metric is based on the observation that VSM characteristics differ between textures of differing regularity. The proposed texture regularity metric is based on two texture regularity scores, namely a textural similarity score and a spatial distribution score. In order to evaluate the performance of the proposed regularity metric, a texture regularity database called RegTEX, is built as a part of this work. It is shown through subjective testing that the proposed metric has a strong correlation with the Mean Opinion Score (MOS) for the perceived regularity of textures. The proposed method is also shown to be robust to geometric and photometric transformations and outperforms some of the popular texture regularity metrics in predicting the perceived regularity. The impact of the proposed metric to improve the performance of many image-processing applications is also presented. The influence of the perceived texture regularity on the perceptual quality of synthesized textures is demonstrated through building a synthesized textures database named SynTEX. It is shown through subjective testing that textures with different degrees of perceived regularities exhibit different degrees of vulnerability to artifacts resulting from different texture synthesis approaches. This work also proposes an algorithm for adaptively selecting the appropriate texture synthesis method based on the perceived regularity of the original texture. A reduced-reference texture quality metric for texture synthesis is also proposed as part of this work. The metric is based on the change in perceived regularity and the change in perceived granularity between the original and the synthesized textures. The perceived granularity is quantified through a new granularity metric that is proposed in this work. It is shown through subjective testing that the proposed quality metric, using just 2 parameters, has a strong correlation with the MOS for the fidelity of synthesized textures and outperforms the state-of-the-art full-reference quality metrics on 3 different texture databases. Finally, the ability of the proposed regularity metric in predicting the perceived degradation of textures due to compression and blur artifacts is also established.Dissertation/ThesisPh.D. Electrical Engineering 201

    3D Remote Sensing Applications in Forest Ecology: Composition, Structure and Function

    Get PDF
    Dear Colleagues, The composition, structure and function of forest ecosystems are the key features characterizing their ecological properties, and can thus be crucially shaped and changed by various biotic and abiotic factors on multiple spatial scales. The magnitude and extent of these changes in recent decades calls for enhanced mitigation and adaption measures. Remote sensing data and methods are the main complementary sources of up-to-date synoptic and objective information of forest ecology. Due to the inherent 3D nature of forest ecosystems, the analysis of 3D sources of remote sensing data is considered to be most appropriate for recreating the forest’s compositional, structural and functional dynamics. In this Special Issue of Forests, we published a set of state-of-the-art scientific works including experimental studies, methodological developments and model validations, all dealing with the general topic of 3D remote sensing-assisted applications in forest ecology. We showed applications in forest ecology from a broad collection of method and sensor combinations, including fusion schemes. All in all, the studies and their focuses are as broad as a forest’s ecology or the field of remote sensing and, thus, reflect the very diverse usages and directions toward which future research and practice will be directed

    Dynamical models and machine learning for supervised segmentation

    Get PDF
    This thesis is concerned with the problem of how to outline regions of interest in medical images, when the boundaries are weak or ambiguous and the region shapes are irregular. The focus on machine learning and interactivity leads to a common theme of the need to balance conflicting requirements. First, any machine learning method must strike a balance between how much it can learn and how well it generalises. Second, interactive methods must balance minimal user demand with maximal user control. To address the problem of weak boundaries,methods of supervised texture classification are investigated that do not use explicit texture features. These methods enable prior knowledge about the image to benefit any segmentation framework. A chosen dynamic contour model, based on probabilistic boundary tracking, combines these image priors with efficient modes of interaction. We show the benefits of the texture classifiers over intensity and gradient-based image models, in both classification and boundary extraction. To address the problem of irregular region shape, we devise a new type of statistical shape model (SSM) that does not use explicit boundary features or assume high-level similarity between region shapes. First, the models are used for shape discrimination, to constrain any segmentation framework by way of regularisation. Second, the SSMs are used for shape generation, allowing probabilistic segmentation frameworks to draw shapes from a prior distribution. The generative models also include novel methods to constrain shape generation according to information from both the image and user interactions. The shape models are first evaluated in terms of discrimination capability, and shown to out-perform other shape descriptors. Experiments also show that the shape models can benefit a standard type of segmentation algorithm by providing shape regularisers. We finally show how to exploit the shape models in supervised segmentation frameworks, and evaluate their benefits in user trials

    Energy-efficient circuits and systems for computational imaging and vision on mobile devices

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 125-127).Eighty five percent of images today are taken by cell phones. These images are not merely projections of light from the scene onto the camera sensor but result from a deep calculation. This calculation involves a number of computational imaging algorithms such as high dynamic range (HDR) imaging, panorama stitching, image deblurring and low-light imaging that compensate for camera limitations, and a number of deep learning based vision algorithms such as face recognition, object recognition and scene understanding that make inference on these images for a variety of emerging applications. However, because of their high computational complexity, mobile CPU or GPU based implementations of these algorithms do not achieve real-time performance. Moreover, offloading these algorithms to the cloud is not a viable solution because wirelessly transmitting large amounts of image data results in long latency and high energy consumption, making them unsuitable for mobile devices. This work solves these problems by designing energy-efficient hardware accelerators targeted at these applications. It presents the architecture of two complete computational imaging systems for energy-constrained mobile environments: (1) an energy-scalable accelerator for blind image deblurring, with an on-chip implementation and (2) a low-power processor for real-time motion magnification in videos, with an FPGA implementation. It also presents a 3D imaging platform and image processing workflow for 3D surface area assessment of dermatologic lesions. It demonstrates that such accelerator-based systems can enable energy-efficient integration of computational imaging and vision algorithms into mobile and wearable devices.by Priyanka Raina.Ph. D

    Depth-Map-Assisted Texture and Depth Map Super-Resolution

    Get PDF
    With the development of video technology, high definition video and 3D video applications are becoming increasingly accessible to customers. The interactive and vivid 3D video experience of realistic scenes relies greatly on the amount and quality of the texture and depth map data. However, due to the limitations of video capturing hardware and transmission bandwidth, transmitted video has to be compressed which degrades, in general, the received video quality. This means that it is hard to meet the users’ requirements of high definition and visual experience; it also limits development of future applications. Therefore, image/video super-resolution techniques have been proposed to address this issue. Image super-resolution aims to reconstruct a high resolution image from single or multiple low resolution images captured of the same scene under different conditions. Based on the image type that needs to be super-resolved, image super-resolution includes texture and depth image super-resolutions. If classified based on the implementation methods, there are three main categories: interpolation-based, reconstruction-based and learning-based super-resolution algorithms. This thesis focuses on exploiting depth data in interpolation-based super-resolution algorithms for texture video and depth maps. Two novel texture and one depth super-resolution algorithms are proposed as the main contributions of this thesis. The first texture super-resolution algorithm is carried out in the Mixed Resolution (MR) multiview video system where at least one of the views is captured at Low Resolution (LR), while the others are captured at Full Resolution (FR). In order to reduce visual uncomfortableness and adapt MR video format for free-viewpoint television, the low resolution views are super-resolved to the target full resolution by the proposed virtual view assisted super resolution algorithm. The inter-view similarity is used to determine whether to fill the missing pixels in the super-resolved frame by virtual view pixels or by spatial interpolated pixels. The decision mechanism is steered by the texture characteristics of the neighbors of each missing pixel. Thus, the proposed method can recover the details in regions with edges while maintaining good quality at smooth areas by properly exploiting the high quality virtual view pixels and the directional correlation of pixels. The second texture super-resolution algorithm is based on the Multiview Video plus Depth (MVD) system, which consists of textures and the associated per-pixel depth data. In order to further reduce the transmitted data and the quality degradation of received video, a systematical framework to downsample the original MVD data and later on to super-resolved the LR views is proposed. At the encoder side, the rows of the two adjacent views are downsampled following an interlacing and complementary fashion, whereas, at the decoder side, the discarded pixels are recovered by fusing the virtual view pixels with the directional interpolated pixels from the complementary downsampled views. Consequently, with the assistance of virtual views, the proposed approach can effectively achieve these two goals. From previous two works, we can observe that depth data has big potential to be used in 3D video enhancement. However, due to the low spatial resolution of Time-of-Flight (ToF) depth camera generated depth images, their applications have been limited. Hence, in the last contribution of this thesis, a planar-surface-based depth map super-resolution approach is presented, which interpolates depth images by exploiting the equation of each detected planar surface. Both quantitative and qualitative experimental results demonstrate the effectiveness and robustness of the proposed approach over benchmark methods

    Combining shape and color. A bottom-up approach to evaluate object similarities

    Get PDF
    The objective of the present work is to develop a bottom-up approach to estimate the similarity between two unknown objects. Given a set of digital images, we want to identify the main objects and to determine whether they are similar or not. In the last decades many object recognition and classification strategies, driven by higher-level activities, have been successfully developed. The peculiarity of this work, instead, is the attempt to work without any training phase nor a priori knowledge about the objects or their context. Indeed, if we suppose to be in an unstructured and completely unknown environment, usually we have to deal with novel objects never seen before; under these hypothesis, it would be very useful to define some kind of similarity among the instances under analysis (even if we do not know which category they belong to). To obtain this result, we start observing that human beings use a lot of information and analyze very different aspects to achieve object recognition: shape, position, color and so on. Hence we try to reproduce part of this process, combining different methodologies (each working on a specific characteristic) to obtain a more meaningful idea of similarity. Mainly inspired by the human conception of representation, we identify two main characteristics and we called them the implicit and explicit models. The term "explicit" is used to account for the main traits of what, in the human representation, connotes a principal source of information regarding a category, a sort of a visual synecdoche (corresponding to the shape); the term "implicit", on the other hand, accounts for the object rendered by shadows and lights, colors and volumetric impression, a sort of a visual metonymy (corresponding to the chromatic characteristics). During the work, we had to face several problems and we tried to define specific solutions. In particular, our contributions are about: - defining a bottom-up approach for image segmentation (which does not rely on any a priori knowledge); - combining different features to evaluate objects similarity (particularly focusiing on shape and color); - defining a generic distance (similarity) measure between objects (without any attempt to identify the possible category they belong to); - analyzing the consequences of using the number of modes as an estimation of the number of mixture’s components (in the Expectation-Maximization algorithm)
    • …
    corecore