16 research outputs found

    A PCA approach to the object constancy for faces using view-based models of the face

    Get PDF
    The analysis of object and face recognition by humans attracts a great deal of interest, mainly because of its many applications in various fields, including psychology, security, computer technology, medicine and computer graphics. The aim of this work is to investigate whether a PCA-based mapping approach can offer a new perspective on models of object constancy for faces in human vision. An existing system for facial motion capture and animation developed for performance-driven animation of avatars is adapted, improved and repurposed to study face representation in the context of viewpoint and lighting invariance. The main goal of the thesis is to develop and evaluate a new approach to viewpoint invariance that is view-based and allows mapping of facial variation between different views to construct a multi-view representation of the face. The thesis describes a computer implementation of a model that uses PCA to generate example- based models of the face. The work explores the joint encoding of expression and viewpoint using PCA and the mapping between viewspecific PCA spaces. The simultaneous, synchronised video recording of 6 views of the face was used to construct multi-view representations, which helped to investigate how well multiple views could be recovered from a single view via the content addressable memory property of PCA. A similar approach was taken to lighting invariance. Finally, the possibility of constructing a multi-view representation from asynchronous view-based data was explored. The results of this thesis have implications for a continuing research problem in computer vision – the problem of recognising faces and objects from different perspectives and in different lighting. It also provides a new approach to understanding viewpoint invariance and lighting invariance in human observers

    Inferring surface shape from specular reflections

    Get PDF

    Scene and crowd analysis using synthetic data generation with 3D quality improvements and deep network architectures

    Get PDF
    In this thesis, a scene analysis mainly focusing on vision-based techniques have been explored. The vision-based scene analysis techniques have a wide range of applications from surveillance, security to agriculture. A vision sensor can provide rich information about the environment such as colour, depth, shape, size and much more. This information can be further processed to have an in-depth knowledge of the scene such as type of environment, objects and distances. Hence, this thesis covers initially the background on human detection in particular pedestrian and crowd detection methods and introduces various vision-based techniques used in human detection. Followed by a detailed analysis of the use of synthetic data to improve the performance of state-of-the-art Deep Learning techniques and a multi-purpose synthetic data generation tool is proposed. The tool is a real-time graphics simulator which generates multiple types of synthetic data applicable for pedestrian detection, crowd density estimation, image segmentation, depth estimation, and 3D pose estimation. In the second part of the thesis, a novel technique has been proposed to improve the quality of the synthetic data. The inter-reflection also known as global illumination is a naturally occurring phenomena and is a major problem for 3D scene generation from an image. Thus, the proposed methods utilised a reverted ray-tracing technique to reduce the effect of inter-reflection problem and increased the quality of generated data. In addition, a method to improve the quality of the density map is discussed in the following chapter. The density map is the most commonly used technique to estimate crowds. However, the current procedure used to generate the map is not content-aware i.e., density map does not highlight the humans’ heads according to their size in the image. Thus, a novel method to generate a content-aware density map was proposed and demonstrated that the use of such maps can elevate the performance of an existing Deep Learning architecture. In the final part, a Deep Learning architecture has been proposed to estimate the crowd in the wild. The architecture tackled the challenging aspect such as perspective distortion by implementing several techniques like pyramid style inputs, scale aggregation method and self-attention mechanism to estimate a crowd density map and achieved state-of-the-art results at the time

    Factor Graphs for Computer Vision and Image Processing

    No full text
    Factor graphs have been used extensively in the decoding of error correcting codes such as turbo codes, and in signal processing. However, while computer vision and pattern recognition are awash with graphical model usage, it is some-what surprising that factor graphs are still somewhat under-researched in these communities. This is surprising because factor graphs naturally generalise both Markov random fields and Bayesian networks. Moreover, they are useful in modelling relationships between variables that are not necessarily probabilistic and allow for efficient marginalisation via a sum-product of probabilities. In this thesis, we present and illustrate the utility of factor graphs in the vision community through some of the field’s popular problems. The thesis does so with a particular focus on maximum a posteriori (MAP) inference in graphical structures with layers. To this end, we are able to break-down complex problems into factored representations and more computationally realisable constructions. Firstly, we present a sum-product framework that uses the explicit factorisation in local subgraphs from the partitioned factor graph of a layered structure to perform inference. This provides an efficient method to perform inference since exact inference is attainable in the resulting local subtrees. Secondly, we extend this framework to the entire graphical structure without partitioning, and discuss preliminary ways to combine outputs from a multilevel construction. Lastly, we further our endeavour to combine evidence from different methods through a simplicial spanning tree reparameterisation of the factor graph in a way that ensures consistency, to produce an ensembled and improved result. Throughout the thesis, the underlying feature we make use of is to enforce adjacency constraints using Delaunay triangulations computed by adding points dynamically, or using a convex hull algorithm. The adjacency relationships from Delaunay triangulations aid the factor graph approaches in this thesis to be both efficient and competitive for computer vision tasks. This is because of the low treewidth they provide in local subgraphs, as well as the reparameterised interpretation of the graph they form through the spanning tree of simplexes. While exact inference is known to be intractable for junction trees obtained from the loopy graphs in computer vision, in this thesis we are able to effect exact inference on our spanning tree of simplexes. More importantly, the approaches presented here are not restricted to the computer vision and image processing fields, but are extendable to more general applications that involve distributed computations

    Integrating Shape-from-Shading & Stereopsis

    Get PDF
    This thesis is concerned with inferring scene shape by combining two specifictechniques: shape-from-shading and stereopsis. Shape-from-shading calculates shape using the lighting equation, which takes surface orientation and lighting information to irradiance. As irradiance and lighting information are provided this is the problem of inverting a many to one function to get surface orientation. Surface orientation may be integrated to get depth. Stereopsismatches pixels between two images taken from different locations of the same scene - this is the correspondence problem. Depth can then be calculated using camera calibration information, via triangulation. These methods both fail for certain inputs; the advantage of combining them is that where one fails the other may continue to work. Notably, shape-from-shading requires a smoothly shaded surface, without texture, whilst stereopsis requires texture - each works where the other does not. The first work of this thesis tackles the problem directly. A novel modular solution is proposed to combine both methods; combining is itself done using Gaussian belief propagation. This modular approach highlights missing and weak modules; the rest of the thesis is then concerned with providing a new module and an improved module. The improved module is given in the second research chapter and consists of a new shape-from-shading algorithm. It again uses belief propagation, but this time with directional statistics to represent surface orientation. Message passing is performed using a novel method; it is analytical, which makes this algorithm particularly fast. In the final research chapter a new module is provided, to estimate the light source direction. Without such a modulethe user of the system has to provide it; this is tedious and error prone, andimpedes automation. It is a probabilistic method that uniquely estimates the light source direction using a stereo pair as input

    Colour coded

    Get PDF
    This 300 word publication to be published by the Society of Dyers and Colourists (SDC) is a collection of the best papers from a 4-year European project that has considered colour from the perspective of both the arts and sciences.The notion of art and science and the crossovers between the two resulted in application and funding for cross disciplinary research to host a series of training events between 2006 and 2010 Marie Curie Conferences & Training Courses (SCF) Call Identifier: FP6-Mobility-4, Euros 532,363.80 CREATE – Colour Research for European Advanced Technology Employment. The research crossovers between the fields of art, science and technology was also a subject that was initiated through Bristol’s Festival if Ideas events in May 2009. The author coordinated and chaired an event during which the C.P Snow lecture “On Two Cultures’ (1959) was re-presented by Actor Simon Cook and then a lecture made by Raymond Tallis on the notion of the Polymath. The CREATE project has a worldwide impact for researchers, academics and scientists. Between January and October 2009, the site has received 221, 414 visits. The most popular route into the site is via the welcome page. The main groups of visitors originate in the UK (including Northern Ireland), Italy, France, Finland, Norway, Hungary, USA, Finland and Spain. A basic percentage breakdown of the traffic over ten months indicates: USA -15%; UK - 16%; Italy - 13%; France -12%; Hungary - 10%; Spain - 6%; Finland - 9%; Norway - 5%. The remaining approximate 14% of visitors are from other countries including Belgium, The Netherlands and Germany (approx 3%). A discussion group has been initiated by the author as part of the CREATE project to facilitate an ongoing dialogue between artists and scientists. http://createcolour.ning.com/group/artandscience www.create.uwe.ac.uk.Related papers to this research: A report on the CREATE Italian event: Colour in cultural heritage.C. Parraman, A. Rizzi, ‘Developing the CREATE network in Europe’, in Colour in Art, Design and Nature, Edinburgh, 24 October 2008.C. Parraman, “Mixing and describing colour”. CREATE (Training event 1), France, 2008

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Reconstruction from Spatio-Spectrally Coded Multispectral Light Fields

    Get PDF
    In dieser Arbeit werden spektral codierte multispektrale Lichtfelder, wie sie von einer Lichtfeldkamera mit einem spektral codierten Mikrolinsenarray aufgenommen werden, untersucht. FĂŒr die Rekonstruktion der codierten Lichtfelder werden zwei Methoden entwickelt und im Detail ausgewertet. ZunĂ€chst wird eine vollstĂ€ndige Rekonstruktion des spektralen Lichtfelds entwickelt, die auf den Prinzipien des Compressed Sensing basiert. Um die spektralen Lichtfelder spĂ€rlich darzustellen, werden 5D-DCT-Basen sowie ein Ansatz zum Lernen eines Dictionary untersucht. Der konventionelle vektorisierte Dictionary-Lernansatz wird auf eine tensorielle Notation verallgemeinert, um das Lichtfeld-Dictionary tensoriell zu faktorisieren. Aufgrund der reduzierten Anzahl von zu lernenden Parametern ermöglicht dieser Ansatz grĂ¶ĂŸere effektive AtomgrĂ¶ĂŸen. Zweitens wird eine auf Deep Learning basierende Rekonstruktion der spektralen Zentralansicht und der zugehörigen DisparitĂ€tskarte aus dem codierten Lichtfeld entwickelt. Dabei wird die gewĂŒnschte Information direkt aus den codierten Messungen geschĂ€tzt. Es werden verschiedene Strategien des entsprechenden Multi-Task-Trainings verglichen. Um die QualitĂ€t der Rekonstruktion weiter zu verbessern, wird eine neuartige Methode zur Einbeziehung von Hilfslossfunktionen auf der Grundlage ihrer jeweiligen normalisierten GradientenĂ€hnlichkeit entwickelt und gezeigt, dass sie bisherige adaptive Methoden ĂŒbertrifft. Um die verschiedenen RekonstruktionsansĂ€tze zu trainieren und zu bewerten, werden zwei DatensĂ€tze erstellt. ZunĂ€chst wird ein großer synthetischer spektraler Lichtfelddatensatz mit verfĂŒgbarer DisparitĂ€t Ground Truth unter Verwendung eines Raytracers erstellt. Dieser Datensatz, der etwa 100k spektrale Lichtfelder mit dazugehöriger DisparitĂ€t enthĂ€lt, wird in einen Trainings-, Validierungs- und Testdatensatz aufgeteilt. Um die QualitĂ€t weiter zu bewerten, werden sieben handgefertigte Szenen, so genannte Datensatz-Challenges, erstellt. Schließlich wird ein realer spektraler Lichtfelddatensatz mit einer speziell angefertigten spektralen Lichtfeldreferenzkamera aufgenommen. Die radiometrische und geometrische Kalibrierung der Kamera wird im Detail besprochen. Anhand der neuen DatensĂ€tze werden die vorgeschlagenen RekonstruktionsansĂ€tze im Detail bewertet. Es werden verschiedene Codierungsmasken untersucht -- zufĂ€llige, regulĂ€re, sowie Ende-zu-Ende optimierte Codierungsmasken, die mit einer neuartigen differenzierbaren fraktalen Generierung erzeugt werden. DarĂŒber hinaus werden weitere Untersuchungen durchgefĂŒhrt, zum Beispiel bezĂŒglich der AbhĂ€ngigkeit von Rauschen, der Winkelauflösung oder Tiefe. Insgesamt sind die Ergebnisse ĂŒberzeugend und zeigen eine hohe RekonstruktionsqualitĂ€t. Die Deep-Learning-basierte Rekonstruktion, insbesondere wenn sie mit adaptiven Multitasking- und Hilfslossstrategien trainiert wird, ĂŒbertrifft die Compressed-Sensing-basierte Rekonstruktion mit anschließender DisparitĂ€tsschĂ€tzung nach dem Stand der Technik

    Multimedia Forensics

    Get PDF
    This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field
    corecore