33 research outputs found

    Bayesian Dictionary Learning for Single and Coupled Feature Spaces

    Get PDF
    Over-complete bases offer the flexibility to represent much wider range of signals with more elementary basis atoms than signal dimension. The use of over-complete dictionaries for sparse representation has been a new trend recently and has increasingly become recognized as providing high performance for applications such as denoise, image super-resolution, inpaiting, compression, blind source separation and linear unmixing. This dissertation studies the dictionary learning for single or coupled feature spaces and its application in image restoration tasks. A Bayesian strategy using a beta process prior is applied to solve both problems. Firstly, we illustrate how to generalize the existing beta process dictionary learning method (BP) to learn dictionary for single feature space. The advantage of this approach is that the number of dictionary atoms and their relative importance may be inferred non-parametrically. Next, we propose a new beta process joint dictionary learning method (BP-JDL) for coupled feature spaces, where the learned dictionaries also reflect the relationship between the two spaces. Compared to previous couple feature spaces dictionary learning algorithms, our algorithm not only provides dictionaries that customized to each feature space, but also adds more consistent and accurate mapping between the two feature spaces. This is due to the unique property of the beta process model that the sparse representation can be decomposed to values and dictionary atom indicators. The proposed algorithm is able to learn sparse representations that correspond to the same dictionary atoms with the same sparsity but different values in coupled feature spaces, thus bringing consistent and accurate mapping between coupled feature spaces. Two applications, single image super-resolution and inverse halftoning, are chosen to evaluate the performance of the proposed Bayesian approach. In both cases, the Bayesian approach, either for single feature space or coupled feature spaces, outperforms state-of-the-art methods in comparative domains

    Black-box printer models and their applications

    Get PDF
    In the electrophotographic printing process, the deposition of toner within the area of a given printer addressable pixel is strongly influenced by the values of its neighboring pixels. The interaction between neighboring pixels, which is commonly referred to as dot-gain, is complicated. The printer models which are developed according to a pre-designed test page can either be embedded in the halftoning algorithm, or used to predict the printed halftone image at the input to an algorithm being used to assess print quality. In our research, we examine the potential influence of a larger neighborhood (45?45) of the digital halftone image on the measured value of a printed pixel at the center of that neighborhood by introducing a feasible strategy for the contribution. We developed a series of six models with different accuracy and computational complexity to account for local neighborhood effects and the influence of a 45?45 neighborhood of pixels on the central printer-addressable pixel tone development. All these models are referred to as Black Box Model (BBM) since they are based solely on measuring what is on the printed page, and do not incorporate any information about the marking process itself. We developed two different types of printer models Standard Definition (SD) BBM and High Definition (HD) BBM with capture device Epson Expression 10000XL (Epson America, Inc., Long Beach, CA, USA) flatbed scanner operated at 2400 dpi under different analysis resolutions. The experiment results show that the larger neighborhood models yield a significant improvement in the accuracy of the prediction of the pixel values of the printed halftone image. The sample function generation black box model (SFG-BBM) is an extension of SD-BBM that adds the printing variation to the mean prediction to improve the prediction by more accurately matching the characteristics of the actual printed image. We also followed a structure similar to that used to develop our series of BBMs to develop a two-stage toner usage predictor for electrophotographic printers. We first obtained on a pixel-by-pixel basis, the predicted absorptance of printed and scanned page with the digital input using BBM. We then form a weighted sum of these predicted pixel values to predict overall toner usage on the printed page. Our two-stage predictor significantly outperforms existing method that is based on a simple pixel counting strategy, in terms of both accuracy and robustness of the prediction

    Studies on Imaging System and Machine Learning: 3D Halftoning and Human Facial Landmark Localization

    Get PDF
    In this dissertation, studies on digital halftoning and human facial landmark localization will be discussed. 3D printing is becoming increasingly popular around the world today. By utilizing 3D printing technology, customized products can be manufactured much more quickly and efficiently with much less cost. However, 3D printing still suffers from low-quality surface reproduction compared with 2D printing. One approach to improve it is to develop an advanced halftoning algorithm for 3D printing. In this presentation, we will describe a novel method to 3D halftoning that can cooperate with 3D printing technology in order to generate a high-quality surface reproduction. In the second part of this report, a new method named direct element swap to create a threshold matrix for halftoning is proposed. This method directly swaps the elements in a threshold matrix to find the best element arrangement by minimizing a designated perceived error metric. Through experimental results, the new method yields halftone quality that is competitive with the conventional level-by-level matrix design method. Besides, by using direct element swap method, for the first time, threshold matrix can be designed through being trained with real images. In the second part of the dissertation, a novel facial landmark detection system is presented. Facial landmark detection plays a critical role in many face analysis tasks. However, it still remains a very challenging problem. The challenges come from the large variations of face appearance caused by different illuminations, different facial expressions, different yaw, pitch and roll angles of heads and different image qualities. To tackle this problem, a novel coarse-to-fine cascaded convolutional neural network system for robust facial landmark detection of faces in the wild is presented. The experiment result shows our method outperforms other state-of-the-art methods on public test datasets. Besides, a frontal and profile landmark localization system is proposed and designed. By using a frontal/profile face classifier, either frontal landmark configuration or profile landmark configuration is employed in the facial landmark prediction based on the input face yaw angle

    Hardware-accelerated algorithms in visual computing

    Get PDF
    This thesis presents new parallel algorithms which accelerate computer vision methods by the use of graphics processors (GPUs) and evaluates them with respect to their speed, scalability, and the quality of their results. It covers the fields of homogeneous and anisotropic diffusion processes, diffusion image inpainting, optic flow, and halftoning. In this turn, it compares different solvers for homogeneous diffusion and presents a novel \u27extended\u27 box filter. Moreover, it suggests to use the fast explicit diffusion scheme (FED) as an efficient and flexible solver for nonlinear and in particular for anisotropic parabolic diffusion problems on graphics hardware. For elliptic diffusion-like processes, it recommends to use cascadic FED or Fast Jacobi schemes. The presented optic flow algorithm represents one of the fastest yet very accurate techniques. Finally, it presents a novel halftoning scheme which yields state-of-the-art results for many applications in image processing and computer graphics.Diese Arbeit präsentiert neue parallele Algorithmen zur Beschleunigung von Methoden in der Bildinformatik mittels Grafikprozessoren (GPUs), und evaluiert diese im Hinblick auf Geschwindigkeit, Skalierungsverhalten, und Qualität der Resultate. Sie behandelt dabei die Gebiete der homogenen und anisotropen Diffusionsprozesse, Inpainting (Bildvervollständigung) mittels Diffusion, die Bestimmung des optischen Flusses, sowie Halbtonverfahren. Dabei werden verschiedene Löser für homogene Diffusion verglichen und ein neuer \u27erweiterter\u27 Mittelwertfilter präsentiert. Ferner wird vorgeschlagen, das schnelle explizite Diffusionsschema (FED) als effizienten und flexiblen Löser für parabolische nichtlineare und speziell anisotrope Diffusionsprozesse auf Grafikprozessoren einzusetzen. Für elliptische diffusionsartige Prozesse wird hingegen empfohlen, kaskadierte FED- oder schnelle Jacobi-Verfahren einzusetzen. Der vorgestellte Algorithmus zur Berechnung des optischen Flusses stellt eines der schnellsten und dennoch äußerst genauen Verfahren dar. Schließlich wird ein neues Halbtonverfahren präsentiert, das in vielen Bereichen der Bildverarbeitung und Computergrafik Ergebnisse produziert, die den Stand der Technik repräsentieren

    Compression, pose tracking, and halftoning

    Get PDF
    In this thesis, we discuss image compression, pose tracking, and halftoning. Although these areas seem to be unrelated at first glance, they can be connected through video coding as application scenario. Our first contribution is an image compression algorithm based on a rectangular subdivision scheme which stores only a small subsets of the image points. From these points, the remained of the image is reconstructed using partial differential equations. Afterwards, we present a pose tracking algorithm that is able to follow the 3-D position and orientation of multiple objects simultaneously. The algorithm can deal with noisy sequences, and naturally handles both occlusions between different objects, as well as occlusions occurring in kinematic chains. Our third contribution is a halftoning algorithm based on electrostatic principles, which can easily be adjusted to different settings through a number of extensions. Examples include modifications to handle varying dot sizes or hatching. In the final part of the thesis, we show how to combine our image compression, pose tracking, and halftoning algorithms to novel video compression codecs. In each of these four topics, our algorithms yield excellent results that outperform those of other state-of-the-art algorithms.In dieser Arbeit werden die auf den ersten Blick vollkommen voneinander unabhängig erscheinenden Bereiche Bildkompression, 3D-Posenschätzung und Halbtonverfahren behandelt und im Bereich der Videokompression sinnvoll zusammengeführt. Unser erster Beitrag ist ein Bildkompressionsalgorithmus, der auf einem rechteckigen Unterteilungsschema basiert. Dieser Algorithmus speichert nur eine kleine Teilmenge der im Bild vorhandenen Punkte, während die restlichen Punkte mittels partieller Differentialgleichungen rekonstruiert werden. Danach stellen wir ein Posenschätzverfahren vor, welches die 3D-Position und Ausrichtung von mehreren Objekten anhand von Bilddaten gleichzeitig verfolgen kann. Unser Verfahren funktioniert bei verrauschten Videos und im Falle von Objektüberlagerungen. Auch Verdeckungen innerhalb einer kinematischen Kette werden natürlich behandelt. Unser dritter Beitrag ist ein Halbtonverfahren, das auf elektrostatischen Prinzipien beruht. Durch eine Reihe von Erweiterungen kann dieses Verfahren flexibel an verschiedene Szenarien angepasst werden. So ist es beispielsweise möglich, verschiedene Punktgrößen zu verwenden oder Schraffuren zu erzeugen. Der letzte Teil der Arbeit zeigt, wie man unseren Bildkompressionsalgorithmus, unser Posenschätzverfahren und unser Halbtonverfahren zu neuen Videokompressionsalgorithmen kombinieren kann. Die für jeden der vier Themenbereiche entwickelten Verfahren erzielen hervorragende Resultate, welche die Ergebnisse anderer moderner Verfahren übertreffen

    Perceptually inspired image estimation and enhancement

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2009.Includes bibliographical references (p. 137-144).In this thesis, we present three image estimation and enhancement algorithms inspired by human vision. In the first part of the thesis, we propose an algorithm for mapping one image to another based on the statistics of a training set. Many vision problems can be cast as image mapping problems, such as, estimating reflectance from luminance, estimating shape from shading, separating signal and noise, etc. Such problems are typically under-constrained, and yet humans are remarkably good at solving them. Classic computational theories about the ability of the human visual system to solve such under-constrained problems attribute this feat to the use of some intuitive regularities of the world, e.g., surfaces tend to be piecewise constant. In recent years, there has been considerable interest in deriving more sophisticated statistical constraints from natural images, but because of the high-dimensional nature of images, representing and utilizing the learned models remains a challenge. Our techniques produce models that are very easy to store and to query. We show these techniques to be effective for a number of applications: removing noise from images, estimating a sharp image from a blurry one, decomposing an image into reflectance and illumination, and interpreting lightness illusions. In the second part of the thesis, we present an algorithm for compressing the dynamic range of an image while retaining important visual detail. The human visual system confronts a serious challenge with dynamic range, in that the physical world has an extremely high dynamic range, while neurons have low dynamic ranges.(cont.) The human visual system performs dynamic range compression by applying automatic gain control, in both the retina and the visual cortex. Taking inspiration from that, we designed techniques that involve multi-scale subband transforms and smooth gain control on subband coefficients, and resemble the contrast gain control mechanism in the visual cortex. We show our techniques to be successful in producing dynamic-range-compressed images without compromising the visibility of detail or introducing artifacts. We also show that the techniques can be adapted for the related problem of "companding", in which a high dynamic range image is converted to a low dynamic range image and saved using fewer bits, and later expanded back to high dynamic range with minimal loss of visual quality. In the third part of the thesis, we propose a technique that enables a user to easily localize image and video editing by drawing a small number of rough scribbles. Image segmentation, usually treated as an unsupervised clustering problem, is extremely difficult to solve. With a minimal degree of user supervision, however, we are able to generate selection masks with good quality. Our technique learns a classifier using the user-scribbled pixels as training examples, and uses the classifier to classify the rest of the pixels into distinct classes. It then uses the classification results as per-pixel data terms, combines them with a smoothness term that respects color discontinuities, and generates better results than state-of-art algorithms for interactive segmentation.by Yuanzhen Li.Ph.D

    Napodobení a výroba vzhledu pomocí diferencovatelných materiálových modelů

    Get PDF
    Výpočetní deriváty kódu - s kódem - jsou jedním z klíčových aktivátorů revoluce strojového učení. V počítačové grafice umožňuje automatická diferenciace řešit problémy s inverzním renderingem, kde se z jednoho nebo několika vstupních snímků získávají parametry jako je odrazovost objektu, poloha nebo koeficienty rozptylu a absorpce ob- jemu. V této práci zvažujeme problémy s přizpůsobením vzhledu a s výrobou, které lze uvést jako příklady problémů s inverzním renderingem. Zatímco optimalizace založená na gradientu, kterou umožňují diferencovatelné programy, má potenciál přinést velmi dobré výsledky, vyžaduje správné využití. Diferenciovatelný rendering není řešením problémů typu brokovnice. Diskutujeme jak teoretické koncepty, tak praktickou implementaci dife- rencovatelných renderingových algoritmů a ukazujeme, jak se spojují s různými problémy s přizpůsobením vzhledu. 1Computing derivatives of code - with code - is one of the key enablers of the machine learning revolution. In computer graphics, automatic differentiation allows to solve in- verse rendering problems. There, parameters such as an objects reflectance, position, or the scattering- and absorption coefficients of a volume, are recovered from one or several input images. In this work, we consider appearance matching and fabrication problems, that can be cast as instances of inverse rendering problems. While gradient-based opti- mization that is enabled by differentiable programs has the potential to yield very good results, it requires proper handling - differentiable rendering is not a shotgun-type prob- lem solver. We discuss both theoretical concepts and the practical implementation of differentiable rendering algorithms, and show how they connect to different appearance matching problems. 1Katedra softwaru a výuky informatikyDepartment of Software and Computer Science EducationMatematicko-fyzikální fakultaFaculty of Mathematics and Physic

    Robust image steganography method suited for prining = Robustna steganografska metoda prilagođena procesu tiska

    Get PDF
    U ovoj doktorskoj dizertaciji prezentirana je robustna steganografska metoda razvijena i prilagođena za tisak. Osnovni cilj metode je pružanje zaštite od krivotvorenja ambalaže. Zaštita ambalaže postiže se umetanjem više bitova informacije u sliku pri enkoderu, a potom maskiranjem informacije kako bi ona bila nevidljiva ljudskom oku. Informacija se pri dekoderu detektira pomoću infracrvene kamere. Preliminarna istraživanja pokazala su da u relevantnoj literaturi nedostaje metoda razvijenih za domenu tiska. Razlog za takav nedostatak jest činjenica da razvijanje steganografskih metoda za tisak zahtjeva veću količinu resursa i materijala, u odnosu na razvijanje sličnih domena za digitalnu domenu. Također, metode za tisak često zahtijevaju višu razinu kompleksnosti, budući da se tijekom reprodukcije pojavljuju razni oblici procesiranja koji mogu kompromitirati informaciju u slici [1]. Da bi se sačuvala skrivena informacija, metoda mora biti otporna na procesiranje koje se događa tijekom reprodukcije. Kako bi se postigla visoka razina otpornosti, informacija se može umetnuti unutar frekvencijske domene slike [2], [3]. Frekvencijskoj domeni slike možemo pristupiti pomoću matematičkih transformacija. Najčešće se koriste diskretna kosinusna transformacija (DCT), diskretna wavelet transformacija (DWT) i diskretna Fourierova transformacija (DFT) [2], [4]. Korištenje svake od navedenih transformacija ima određene prednosti i nedostatke, ovisno o kontekstu razvijanja metode [5]. Za metode prilagođene procesu tiska, diskretna Fourierova transformacija je optimalan odabir, budući da metode bazirane na DFT-u pružaju otpornost na geometrijske transformacije koje se događaju tijekom reprodukcije [5], [6]. U ovom istraživanju korištene su slike u cmyk prostoru boja. Svaka slika najprije je podijeljena u blokove, a umetanje informacije vrši se za svaki blok pojedinačno. Pomoću DFT-a, ???? kanal slikovnog bloka se transformira u frekvencijsku domenu, gdje se vrši umetanje informacije. Akromatska zamjena koristi se za maskiranje vidljivih artefakata nastalih prilikom umetanja informacije. Primjeri uspješnog korištenja akromatske zamjene za maskiranje artefakata mogu se pronaći u [7] i [8]. Nakon umetanja informacije u svaki slikovni blok, blokovi se ponovno spajaju u jednu, jedinstvenu sliku. Akromatska zamjena tada mijenja vrijednosti c, m i y kanala slike, dok kanal k, u kojemu se nalazi umetnuta informacija, ostaje nepromijenjen. Time nakon maskiranja akromatskom zamjenom označena slika posjeduje ista vizualna svojstva kao i slika prije označavanja. U eksperimentalnom dijelu rada koristi se 1000 slika u cmyk prostoru boja. U digitalnom okruženju provedeno je istraživanje otpornosti metode na slikovne napade specifične za reprodukcijski proces - skaliranje, blur, šum, rotaciju i kompresiju. Također, provedeno je istraživanje otpornosti metode na reprodukcijski proces, koristeći tiskane uzorke. Objektivna metrika bit error rate (BER) korištena je za evaluaciju. Mogućnost optimizacije metode testirala se procesiranjem slike (unsharp filter) i korištenjem error correction kodova (ECC). Provedeno je istraživanje kvalitete slike nakon umetanja informacije. Za evaluaciju su korištene objektivne metrike peak signal to noise ratio (PSNR) i structural similarity index measure (SSIM). PSNR i SSIM su tzv. full-reference metrike. Drugim riječima, potrebne su i neoznačena i označena slika istovremeno, kako bi se mogla utvrditi razina sličnosti između slika [9], [10]. Subjektivna analiza provedena je na 36 ispitanika, koristeći ukupno 144 uzorka slika. Ispitanici su ocijenjivali vidljivost artefakata na skali od nula (nevidljivo) do tri (vrlo vidljivo). Rezultati pokazuju da metoda posjeduje visoku razinu otpornosti na reprodukcijski proces. Također, metoda se uistinu optimizirala korištenjem unsharp filtera i ECC-a. Kvaliteta slike ostaje visoka bez obzira na umetanje informacije, što su potvrdili rezultati eksperimenata s objektivnim metrikama i subjektivna analiza
    corecore