29 research outputs found
The Rehabilitation of MaxRGB
The poor performance of the MaxRGB illuminationestimation method is often used in the literature as a foil when promoting some new illumination-estimation method. However, the results presented here show that in fact MaxRGB works surprisingly well when tested on a new dataset of 105 high dynamic range images, and also better than previously reported when some simple pre-processing is applied to the images of the standard 321 image set [1]. The HDR images in the dataset for color constancy research were constructed in the standard way from multiple exposures of the same scene. The color of the scene illumination was determined by photographing an extra HDR image of the scene with 4 Gretag Macbeth mini Colorcheckers at 45 degrees relative to one another placed in it. With preprocessing, MaxRGB’s performance is statistically equivalent to that of Color by Correlation [2] and statistically superior to that of the Greyedge [3] algorithm on the 321 set (null hypothesis rejected at the 5% significance level). It also performs as well as Greyedge on the HDR set. These results demonstrate that MaxRGB is far more effective than it has been reputed to be so long as it is applied to image data that encodes the full dynamic range of the original scene
Reducing Worst-Case Illumination Estimates for Better Automatic White Balance
Automatic white balancing works quite well on average, but seriously fails some of the time. These failures lead to completely unacceptable images. Can the number, or severity, of these failures be reduced, perhaps at the expense of slightly poorer white balancing on average, with the overall goal being to increase the overall acceptability of a collection of images? Since the main source of error in automatic white balancing arises from misidentifying the overall scene illuminant, a new illuminationestimation algorithm is presented that minimizes the high percentile error of its estimates. The algorithm combines illumination estimates from standard existing algorithms and chromaticity gamut characteristics of the image as features in a feature space. Illuminant chromaticities are quantized into chromaticity bins. Given a test image of a real scene, its feature vector is computed, and for each chromaticity bin, the probability of the illuminant chromaticity falling into a chromaticity bin given the feature vector is estimated. The probability estimation is based on Loftsgaarden-Quesenberry multivariate density function estimation over the feature vectors derived from a set of synthetic training images. Once the probability distribution estimate for a given chromaticity channel is known, the smallest interval that is likely to contain the right answer with a desired probability (i.e., the smallest chromaticity interval whose sum of probabilities is greater or equal to the desired probability) is chosen. The point in the middle of that interval is then reported as the chromaticity of the illuminant. Testing on a dataset of real images shows that the error at the 90th and 98th percentile ranges can be reduced by roughly half, with minimal impact on the mean error
Intersecting Color Manifolds
Logvinenko’s color atlas theory provides a structure in which a complete set of color-equivalent material and illumination pairs can be generated to match any given input RGB color. In chromaticity space, the set of such pairs forms a 2-dimensional manifold embedded in a 4-dimensional space. For singleilluminant scenes, the illumination for different input RGB values must be contained in all the corresponding manifolds. The proposed illumination-estimation method estimates the scene illumination based on calculating the intersection of the illuminant components of the respective manifolds through a Hough-like voting process. Overall, the performance on the two datasets for which camera sensitivity functions are available is comparable to existing methods. The advantage of the formulating the illumination-estimation in terms of manifold intersection is that it expresses the constraints provided by each available RGB measurement within a sound theoretical foundation
Extending minkowski norm illuminant estimation
The ability to obtain colour images invariant to changes of illumination is called colour
constancy. An algorithm for colour constancy takes sensor responses - digital images
- as input, estimates the ambient light and returns a corrected image in which the illuminant
influence over the colours has been removed. In this thesis we investigate the
step of illuminant estimation for colour constancy and aim to extend the state of the art
in this field.
We first revisit the Minkowski Family Norm framework for illuminant estimation.
Because, of all the simple statistical approaches, it is the most general formulation and,
crucially, delivers the best results. This thesis makes four technical contributions. First,
we reformulate the Minkowski approach to provide better estimation when a constraint
on illumination is employed. Second, we show how the method can (by orders of
magnitude) be implemented to run much faster than previous algorithms. Third, we
show how a simple edge based variant delivers improved estimation compared with the
state of the art across many datasets. In contradistinction to the prior state of the art our
definition of edges is fixed (a simple combination of first and second derivatives) i.e.
we do not tune our algorithm to particular image datasets. This performance is further
improved by incorporating a gamut constraint on surface colour -our 4th contribution.
The thesis finishes by considering our approach in the context of a recent OSA
competition run to benchmark computational algorithms operating on physiologically
relevant cone based input data. Here we find that Constrained Minkowski Norms operi
ii
ating on spectrally sharpened cone sensors (linear combinations of the cones that behave
more like camera sensors) supports competition leading illuminant estimation
Recommended from our members
Multimodal Indexing of Presentation Videos
This thesis presents four novel methods to help users efficiently and effectively retrieve information from unstructured and unsourced multimedia sources, in particular the increasing amount and variety of presentation videos such as those in e-learning, conference recordings, corporate talks, and student presentations. We demonstrate a system to summarize, index and cross-reference such videos, and measure the quality of the produced indexes as perceived by the end users. We introduce four major semantic indexing cues: text, speaker faces, graphics, and mosaics, going beyond standard tag based searches and simple video playbacks. This work aims at recognizing visual content "in the wild", where the system cannot rely on any additional information besides the video itself. For text, within a scene text detection and recognition framework, we present a novel locally optimal adaptive binarization algorithm, implemented with integral histograms. It determines of an optimal threshold that maximizes the between-classes variance within a subwindow, with computational complexity independent from the size of the window itself. We obtain character recognition rates of 74%, as validated against ground truth of 8 presentation videos spanning over 1 hour and 45 minutes, which almost doubles the baseline performance of an open source OCR engine. For speaker faces, we detect, track, match, and finally select a humanly preferred face icon per speaker, based on three quality measures: resolution, amount of skin, and pose. We register a 87% accordance (51 out of 58 speakers) between the face indexes automatically generated from three unstructured presentation videos of approximately 45 minutes each, and human preferences recorded through Mechanical Turk experiments. For diagrams, we locate graphics inside frames showing a projected slide, cluster them according to an on-line algorithm based on a combination of visual and temporal information, and select and color-correct their representatives to match human preferences recorded through Mechanical Turk experiments. We register 71% accuracy (57 out of 81 unique diagrams properly identified, selected and color-corrected) on three hours of videos containing five different presentations. For mosaics, we combine two existing suturing measures, to extend video images into in-the-world coordinate system. A set of frames to be registered into a mosaic are sampled according to the PTZ camera movement, which is computed through least square estimation starting from the luminance constancy assumption. A local features based stitching algorithm is then applied to estimate the homography among a set of video frames and median blending is used to render pixels in overlapping regions of the mosaic. For two of these indexes, namely faces and diagrams, we present two novel MTurk-derived user data collections to determine viewer preferences, and show that they are matched in selection by our methods. The net result work of this thesis allows users to search, inside a video collection as well as within a single video clip, for a segment of presentation by professor X on topic Y, containing graph Z
Space, time and acoustics
Thesis (M. Arch.)--Massachusetts Institute of Technology, Dept. of Architecture, 1988.Includes bibliographical references (leaves 156-159).This thesis describes the development of new concepts in acoustical analysis from their inception to implementation as a computer design tool. Research is focused on a computer program which aids the designer to visually conceive the interactions of acoustics within a geometrical~y defined environment by synthesizing the propagation of sound in a three dimensional space over time. Information is communicated through a unique use of images that are better suited for interfacing with the design process. The first part of this thesis describes the concepts behind the development of a graphic acoustical rendering program to a working level. This involves the development of a computer ray tracing prototype that is sufficiently powerful to explore the issues facing this new design and analysis methodology. The second part uses this program to evaluate existing performance spaces in order to establish qualitative criteria in a new visual format. Representational issues relating to the visual perception of acoustic spaces are also explored. In the third part, the program is integrated into the design process. I apply this acoustical tool to an actual design situation by remodeling a large performance hall in Medford, Massachusetts. Chevalier Auditorium is a real project, commissioned by the city of Medford, whose program requirements closely match my intentions in scope, scale and nature of a design for exploring this new acoustical analysis and design methodology. Finally, I summarize this program's effectiveness and discuss its potential in more sophisticated future design environments.by Philip R.Z. Thompson.M.Arch