Search CORE

8 research outputs found

CNN based Learning using Reflection and Retinex Models for Intrinsic Image Decomposition

Author: Baslamisli Anil S.
Gevers Theo
Le Hoang-An
Publication venue
Publication date: 01/01/2018
Field of study

Most of the traditional work on intrinsic image decomposition rely on deriving priors about scene characteristics. On the other hand, recent research use deep learning models as in-and-out black box and do not consider the well-established, traditional image formation process as the basis of their intrinsic learning process. As a consequence, although current deep learning approaches show superior performance when considering quantitative benchmark results, traditional approaches are still dominant in achieving high qualitative results. In this paper, the aim is to exploit the best of the two worlds. A method is proposed that (1) is empowered by deep learning capabilities, (2) considers a physics-based reflection model to steer the learning process, and (3) exploits the traditional approach to obtain intrinsic images by exploiting reflectance and shading gradient information. The proposed model is fast to compute and allows for the integration of all intrinsic components. To train the new model, an object centered large-scale datasets with intrinsic ground-truth images are created. The evaluation results demonstrate that the new model outperforms existing methods. Visual inspection shows that the image formation loss function augments color reproduction and the use of gradient information produces sharper edges. Datasets, models and higher resolution images are available at https://ivi.fnwi.uva.nl/cv/retinet.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Learning Gaussian Conditional Random Fields for Low-Level Vision

Author: Adels Edward H.
Freeman William T.
Liu Ce
Tappen Marshall F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/10/2007
Field of study

Markov Random Field (MRF) models are a popular tool for vision and image processing. Gaussian MRF models are particularly convenient to work with because they can be implemented using matrix and linear algebra routines. However, recent research has focused on on discrete-valued and non-convex MRF models because Gaussian models tend to over-smooth images and blur edges. In this paper, we show how to train a Gaussian Conditional Random Field (GCRF) model that overcomes this weakness and can outperform the non-convex Field of Experts model on the task of denoising images. A key advantage of the GCRF model is that the parameters of the model can be optimized efficiently on relatively large images. The competitive performance of the GCRF model and the ease of optimizing its parameters make the GCRF model an attractive option for vision and image processing applications. ©2007 IEEE

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

A Simple Model for Intrinsic Image Decomposition with Depth Cues

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Modeling Pedestrian Behavior in Video

Author: Scovanner Paul
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2011
Field of study

The purpose of this dissertation is to address the problem of predicting pedestrian movement and behavior in and among crowds. Specifically, we will focus on an agent based approach where pedestrians are treated individually and parameters for an energy model are trained by real world video data. These learned pedestrian models are useful in applications such as tracking, simulation, and artificial intelligence. The applications of this method are explored and experimental results show that our trained pedestrian motion model is beneficial for predicting unseen or lost tracks as well as guiding appearance based tracking algorithms. The method we have developed for training such a pedestrian model operates by optimizing a set of weights governing an aggregate energy function in order to minimize a loss function computed between a model\u27s prediction and annotated ground-truth pedestrian tracks. The formulation of the underlying energy function is such that using tight convex upper bounds, we are able to efficiently approximate the derivative of the loss function with respect to the parameters of the model. Once this is accomplished, the model parameters are updated using straightforward gradient descent techniques in order to achieve an optimal solution. This formulation also lends itself towards the development of a multiple behavior model. The multiple pedestrian behavior styles, informally referred to as stereotypes , are common in real data. In our model we show that it is possible, due to the unique ability to compute the derivative of the loss function, to build a new model which utilizes a soft-minimization of single behavior models. This allows unsupervised training of multiple different behavior models in parallel. This novel extension makes our method unique among other methods in the attempt to accurately describe human pedestrian behavior for the myriad of applications that exist. The ability to describe multiple behaviors shows significant improvements in the task of pedestrian motion prediction

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Perceptually inspired image estimation and enhancement

Author: Li Yuanzhen, Ph. D. Massachusetts Institute of Technology
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2009
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2009.Includes bibliographical references (p. 137-144).In this thesis, we present three image estimation and enhancement algorithms inspired by human vision. In the first part of the thesis, we propose an algorithm for mapping one image to another based on the statistics of a training set. Many vision problems can be cast as image mapping problems, such as, estimating reflectance from luminance, estimating shape from shading, separating signal and noise, etc. Such problems are typically under-constrained, and yet humans are remarkably good at solving them. Classic computational theories about the ability of the human visual system to solve such under-constrained problems attribute this feat to the use of some intuitive regularities of the world, e.g., surfaces tend to be piecewise constant. In recent years, there has been considerable interest in deriving more sophisticated statistical constraints from natural images, but because of the high-dimensional nature of images, representing and utilizing the learned models remains a challenge. Our techniques produce models that are very easy to store and to query. We show these techniques to be effective for a number of applications: removing noise from images, estimating a sharp image from a blurry one, decomposing an image into reflectance and illumination, and interpreting lightness illusions. In the second part of the thesis, we present an algorithm for compressing the dynamic range of an image while retaining important visual detail. The human visual system confronts a serious challenge with dynamic range, in that the physical world has an extremely high dynamic range, while neurons have low dynamic ranges.(cont.) The human visual system performs dynamic range compression by applying automatic gain control, in both the retina and the visual cortex. Taking inspiration from that, we designed techniques that involve multi-scale subband transforms and smooth gain control on subband coefficients, and resemble the contrast gain control mechanism in the visual cortex. We show our techniques to be successful in producing dynamic-range-compressed images without compromising the visibility of detail or introducing artifacts. We also show that the techniques can be adapted for the related problem of "companding", in which a high dynamic range image is converted to a low dynamic range image and saved using fewer bits, and later expanded back to high dynamic range with minimal loss of visual quality. In the third part of the thesis, we propose a technique that enables a user to easily localize image and video editing by drawing a small number of rough scribbles. Image segmentation, usually treated as an unsupervised clustering problem, is extremely difficult to solve. With a minimal degree of user supervision, however, we are able to generate selection masks with good quality. Our technique learns a classifier using the user-scribbled pixels as training examples, and uses the classifier to classify the rest of the pixels into distinct classes. It then uses the classification results as per-pixel data terms, combines them with a smoothness term that respects color discontinuities, and generates better results than state-of-art algorithms for interactive segmentation.by Yuanzhen Li.Ph.D

DSpace@MIT