438 research outputs found

    Model Learning in Iconic Vision

    Get PDF
    Institute of Perception, Action and BehaviourGenerally, object recognition research falls into three main categories: (a) geometric, symbolic or structure based recognition, which is usually associated with CAD-based vision and 3-D object recognition; (b) property, vector or feature based recognition, involving techniques that vary from specific feature vectors, multiple filtering to global descriptors for shape, texture and colour; and (c) iconic or image based recognition, which either complies with the traditional sensor architecture of an uniform array of sampling units, or uses alternative representations. An example is the log-polar image, which is inspired by the human visual system and besides requiring less pixels, has some useful mathematical properties. The context of this thesis is a combination of the above categories in the sense that it investigates the area of iconic based recognition using image features and geometric relationships. It expands an existing vision system that operates by fixating at interesting regions in a scene, extracting a number of raw primal sketch features from a log-polar image and matching new regions to previously seen ones. Primal sketch features like edges, bars, blobs and ends are believed to take part of early visual processes in humans providing cues for an attention mechanism and more compact representations for the image data. In an earlier work, logic operators were defined to extract these features, but the results were not satisfactory. This thesis initially investigates the question of whether or not primal sketch features could be learned from log-polar images, and gives an affirmative answer. The feature extraction process was implemented using a neural network which learns examples of features in a window of receptive fields of the log-polar image. An architecture designed to encode the feature’s class, position, orientation and contrast has been proposed and tested. Success depended on the incorporation of a function that normalises the feature’s orientation and a PCA pre-processing module to produce better separation in the feature space. A strategy that combines synthetic and real features is used for the learning process. This thesis also provides an answer to the important, but so far not well explored, question of how to learn relationships from sets of iconic object models obtained from a set of images. An iconic model is defined as a set of regions, or object instances, that are similar to each other, organised into a geometric model specified by the relative scales, orientations, positions and similarity scores for each pair of image regions. Similarities are measured with a cross-correlation metric and relative scales and orientations are obtained from the best matched translational variants generated in the log-polar space. A solution to the structure learning problem is presented in terms of a graph based representation and algorithm. Vertices represent instances of an image neighbourhood found in the scenes. An edge in the graph represents a relationship between two neighbourhoods. Intra and inter model relationships are inferred by means of the cliques found in the graph, which leads to rigid geometric models inferred from the image evidence

    A survey of visual preprocessing and shape representation techniques

    Get PDF
    Many recent theories and methods proposed for visual preprocessing and shape representation are summarized. The survey brings together research from the fields of biology, psychology, computer science, electrical engineering, and most recently, neural networks. It was motivated by the need to preprocess images for a sparse distributed memory (SDM), but the techniques presented may also prove useful for applying other associative memories to visual pattern recognition. The material of this survey is divided into three sections: an overview of biological visual processing; methods of preprocessing (extracting parts of shape, texture, motion, and depth); and shape representation and recognition (form invariance, primitives and structural descriptions, and theories of attention)

    Medical image enhancement

    Get PDF
    Each image acquired from a medical imaging system is often part of a two-dimensional (2-D) image set whose total presents a three-dimensional (3-D) object for diagnosis. Unfortunately, sometimes these images are of poor quality. These distortions cause an inadequate object-of-interest presentation, which can result in inaccurate image analysis. Blurring is considered a serious problem. Therefore, “deblurring” an image to obtain better quality is an important issue in medical image processing. In our research, the image is initially decomposed. Contrast improvement is achieved by modifying the coefficients obtained from the decomposed image. Small coefficient values represent subtle details and are amplified to improve the visibility of the corresponding details. The stronger image density variations make a major contribution to the overall dynamic range, and have large coefficient values. These values can be reduced without much information loss

    Improving Bags-of-Words model for object categorization

    Get PDF
    In the past decade, Bags-of-Words (BOW) models have become popular for the task of object recognition, owing to their good performance and simplicity. Some of the most effective recent methods for computer-based object recognition work by detecting and extracting local image features, before quantizing them according to a codebook rule such as k-means clustering, and classifying these with conventional classifiers such as Support Vector Machines and Naive Bayes. In this thesis, a Spatial Object Recognition Framework is presented that consists of the four main contributions of the research. The first contribution, frequent keypoint pattern discovery, works by combining pairs and triplets of frequent keypoints in order to discover intermediate representations for object classes. Based on the same frequent keypoints principle, algorithms for locating the region-of-interest in training images is then discussed. Extensions to the successful Spatial Pyramid Matching scheme, in order to better capture spatial relationships, are then proposed. The pairs frequency histogram and shapes frequency histogram work by capturing more redefined spatial information between local image features. Finally, alternative techniques to Spatial Pyramid Matching for capturing spatial information are presented. The proposed techniques, variations of binned log-polar histograms, divides the image into grids of different scale and different orientation. Thus captures the distribution of image features both in distance and orientation explicitly. Evaluations on the framework are focused on several recent and popular datasets, including image retrieval, object recognition, and object categorization. Overall, while the effectiveness of the framework is limited in some of the datasets, the proposed contributions are nevertheless powerful improvements of the BOW model

    Multi-Technique Fusion for Shape-Based Image Retrieval

    Get PDF
    Content-based image retrieval (CBIR) is still in its early stages, although several attempts have been made to solve or minimize challenges associated with it. CBIR techniques use such visual contents as color, texture, and shape to represent and index images. Of these, shapes contain richer information than color or texture. However, retrieval based on shape contents remains more difficult than that based on color or texture due to the diversity of shapes and the natural occurrence of shape transformations such as deformation, scaling and orientation. This thesis presents an approach for fusing several shape-based image retrieval techniques for the purpose of achieving reliable and accurate retrieval performance. An extensive investigation of notable existing shape descriptors is reported. Two new shape descriptors have been proposed as means to overcome limitations of current shape descriptors. The first descriptor is based on a novel shape signature that includes corner information in order to enhance the performance of shape retrieval techniques that use Fourier descriptors. The second descriptor is based on the curvature of the shape contour. This invariant descriptor takes an unconventional view of the curvature-scale-space map of a contour by treating it as a 2-D binary image. The descriptor is then derived from the 2-D Fourier transform of the 2-D binary image. This technique allows the descriptor to capture the detailed dynamics of the curvature of the shape and enhances the efficiency of the shape-matching process. Several experiments have been conducted in order to compare the proposed descriptors with several notable descriptors. The new descriptors not only speed up the online matching process, but also lead to improved retrieval accuracy. The complexity and variety of the content of real images make it impossible for a particular choice of descriptor to be effective for all types of images. Therefore, a data- fusion formulation based on a team consensus approach is proposed as a means of achieving high accuracy performance. In this approach a select set of retrieval techniques form a team. Members of the team exchange information so as to complement each other’s assessment of a database image candidate as a match to query images. Several experiments have been conducted based on the MPEG-7 contour-shape databases; the results demonstrate that the performance of the proposed fusion scheme is superior to that achieved by any technique individually

    An introduction to continuous optimization for imaging

    No full text
    International audienceA large number of imaging problems reduce to the optimization of a cost function , with typical structural properties. The aim of this paper is to describe the state of the art in continuous optimization methods for such problems, and present the most successful approaches and their interconnections. We place particular emphasis on optimal first-order schemes that can deal with typical non-smooth and large-scale objective functions used in imaging problems. We illustrate and compare the different algorithms using classical non-smooth problems in imaging, such as denoising and deblurring. Moreover, we present applications of the algorithms to more advanced problems, such as magnetic resonance imaging, multilabel image segmentation, optical flow estimation, stereo matching, and classification

    Algorithms for the Analysis of Spatio-Temporal Data from Team Sports

    Get PDF
    Modern object tracking systems are able to simultaneously record trajectories—sequences of time-stamped location points—for large numbers of objects with high frequency and accuracy. The availability of trajectory datasets has resulted in a consequent demand for algorithms and tools to extract information from these data. In this thesis, we present several contributions intended to do this, and in particular, to extract information from trajectories tracking football (soccer) players during matches. Football player trajectories have particular properties that both facilitate and present challenges for the algorithmic approaches to information extraction. The key property that we look to exploit is that the movement of the players reveals information about their objectives through cooperative and adversarial coordinated behaviour, and this, in turn, reveals the tactics and strategies employed to achieve the objectives. While the approaches presented here naturally deal with the application-specific properties of football player trajectories, they also apply to other domains where objects are tracked, for example behavioural ecology, traffic and urban planning

    Sparse machine learning methods with applications in multivariate signal processing

    Get PDF
    This thesis details theoretical and empirical work that draws from two main subject areas: Machine Learning (ML) and Digital Signal Processing (DSP). A unified general framework is given for the application of sparse machine learning methods to multivariate signal processing. In particular, methods that enforce sparsity will be employed for reasons of computational efficiency, regularisation, and compressibility. The methods presented can be seen as modular building blocks that can be applied to a variety of applications. Application specific prior knowledge can be used in various ways, resulting in a flexible and powerful set of tools. The motivation for the methods is to be able to learn and generalise from a set of multivariate signals. In addition to testing on benchmark datasets, a series of empirical evaluations on real world datasets were carried out. These included: the classification of musical genre from polyphonic audio files; a study of how the sampling rate in a digital radar can be reduced through the use of Compressed Sensing (CS); analysis of human perception of different modulations of musical key from Electroencephalography (EEG) recordings; classification of genre of musical pieces to which a listener is attending from Magnetoencephalography (MEG) brain recordings. These applications demonstrate the efficacy of the framework and highlight interesting directions of future research

    Applied microlocal analysis of deep neural networks for inverse problems

    Get PDF
    Deep neural networks have recently shown state-of-the-art performance in different imaging tasks. As an example, EfficientNet is today the best image classifier on the ImageNet challenge. They are also very powerful for image reconstruction, for example, deep learning currently yields the best methods for CT reconstruction. Most imaging problems, such as CT reconstruction, are ill-posed inverse problems, which hence require regularization techniques typically based on a-priori information. Also, due to the human visual system, singularities such as edge-like features are the governing structures of images. This leads to the question of how to incorporate such information into a solver of an inverse problem in imaging and how deep neural networks operate on singularities. The main research theme of this thesis is to introduce theoretically founded approaches to use deep neural networks in combination with model-based methods to solve inverse problems from imaging science. We do this by heavily exploring the singularity structure of images as a-priori information. We then develop a comprehensive analysis of how neural networks act on singularities using predominantly methods from the microlocal analysis. For analyzing the interaction of deep neural networks with singularities, we introduce a novel technique to compute the propagation of wavefront sets through convolutional residual neural networks (conv-ResNet). This is achieved in a two-fold manner: We first study the continuous case where the neural network is defined in an infinite-dimensional continuous space. This problem is tackled by using the structure of these networks as a sequential application of continuous convolutional operators and ReLU non-linearities and applying microlocal analysis techniques to track the propagation of the wavefront set through the layers. This then leads to the so-called \emph{microcanonical relation} that describes the propagation of the wavefront set under the action of such a neural network. Secondly, for studying real-world discrete problems, we digitize the necessary microlocal analysis methods via the digital shearlet transform. The key idea is the fact that the shearlet transform optimally represents Fourier integral operators hence such a discretization decays rapidly, allowing a finite approximation. Fourier integral operators play an important role in microlocal analysis, since it is well known that they preserve singularities on functions, and, in addition, they have a closed form microcanonical relation. Also, based on the newly developed theoretical analysis, we introduce a method that uses digital shearlet coefficients to compute the digital wavefront set of images by a convolutional neural network. Our approach is then used for a similar analysis of the microlocal behavior of the learned-primal dual architecture, which is formed by a sequence of conv-ResNet blocks. This architecture has shown state-of-the-art performance in inverse problem regularization, in particular, computed tomography reconstruction related to the Radon transform. Since the Radon operator is a Fourier integral operator, our microlocal techniques can be applied. Therefore, we can study with high precision the singularities propagation of this architecture. Aiming to empirically analyze our theoretical approach, we focus on the reconstruction of X-ray tomographic data. We approach this problem by using a task-adapted reconstruction framework, in which we combine the task of reconstruction with the task of computing the wavefront set of the original image as a-priori information. Our numerical results show superior performance with respect to current state-of-the-art tomographic reconstruction methods; hence we anticipate our work to also be a significant contribution to the biomedical imaging community.Tiefe neuronale Netze haben in letzter Zeit bei verschiedenen Bildverarbeitungsaufgaben Spitzenleistungen gezeigt. Zum Beispiel ist AlexNet heute der beste Bildklassifikator bei der ImageNet-Challenge. Sie sind auch sehr leistungsfaehig fue die Bildrekonstruktion, zum Beispiel liefert Deep Learning derzeit die besten Methoden fuer die CT-Rekonstruktion. Die meisten Bildgebungsprobleme wie die CT-Rekonstruktion sind schlecht gestellte inverse Probleme, die daher Regularisierungstechniken erfordern, die typischerweise auf vorherigen Informationen basieren. Auch aufgrund des menschlichen visuellen Systems sind Singularitaeten wie kantenartige Merkmale die bestimmenden Strukturen von Bildern. Dies fuehrt zu der Frage, wie man solche Informationen in einen Loeser eines inversen Problems in der Bildverarbeitung einbeziehen kann und wie tiefe neuronale Netze mit Singularitaeten arbeiten. Das Hauptforschungsthema dieser Arbeit ist die Einfuehrung theoretisch fundierter konzeptioneller Ansaetze zur Verwendung von tiefen neuronalen Netzen in Kombination mit modellbasierten Methoden zur Loesung inverser Probleme aus der Bildwissenschaft. Wir tun dies, indem wir die Singularitaetsstruktur von Bildern als Vorinformation intensiv erforschen. Dazu entwickeln wir eine umfassende Analyse, wie neuronale Netze auf Singularitaeten wirken, indem wir vorwiegend Methoden aus der mikrolokalen Analyse verwenden. Um die Interaktion von tiefen neuronalen Netzen mit Singularitaeten zu analysieren, fuehren wir eine neuartige Technik ein, um die Ausbreitung von Wellenfrontsaetzen mit Hilfe von Convolutional Residual neuronalen Netzen (Conv-ResNet) zu berechnen. Dies wird auf zweierlei Weise erreicht: Zunaechst untersuchen wir den kontinuierlichen Fall, bei dem das neuronale Netz in einem unendlich dimensionalen kontinuierlichen Raum definiert ist. Dieses Problem wird angegangen, indem wir die besondere Struktur dieser Netze als sequentielle Anwendung von kontinuierlichen Faltungsoperatoren und ReLU-Nichtlinearitaeten nutzen und mikrolokale Analyseverfahren anwenden, um die Ausbreitung einer Wellenfrontmenge durch die Schichten zu verfolgen. Dies fuehrt dann zu einer mikrokanonischen Beziehung, die die Ausbreitung der Wellenfrontmenge unter ihrer Wirkung beschreibt. Zweitens digitalisieren wir die notwendigen mikrolokalen Analysemethoden ueber die digitale Shearlet-Transformation, wobei die Digitalisierung fuer die Untersuchung realer Probleme notwendig ist. Die Schluesselidee ist die Tatsache, dass die Shearlet-Transformation Fourier-Integraloperatoren optimal repraesentiert, so dass eine solche Diskretisierung schnell abklingt und eine endliche Approximation ermoeglicht. Nebenbei stellen wir auch eine Methode vor, die digitale Shearlet-Koeffizienten verwendet, um den digitalen Wellenfrontsatz von Bildern durch ein Faltungsneuronales Netzwerk zu berechnen. Unser Ansatz wird dann fuer eine aehnliche Analyse fuer die gelernte primale-duale Architektur verwendet, die durch eine Sequenz von conv-ResNet-Bloecken gebildet wird. Diese Architektur hat bei der Rekonstruktion inverser Probleme, insbesondere bei der Rekonstruktion der Computertomographie im Zusammenhang mit der Radon-Transformation, Spitzenleistungen gezeigt. Da der Radon-Operator ein Fourier-Integraloperator ist, koennen unsere mikrolokalen Techniken angewendet werden. Um unseren theoretischen Ansatz numerisch zu analysieren, konzentrieren wir uns auf die Rekonstruktion von Roentgentomographiedaten. Wir naehern uns diesem Problem mit Hilfe eines aufgabenangepassten Rekonstruktionsrahmens, in dem wir die Aufgabe der Rekonstruktion mit der Aufgabe der Berechnung der Wellenfrontmenge des Originalbildes als Vorinformation kombinieren. Unsere numerischen Ergebnisse zeigen eine ueberragende Leistung, daher erwarten wir, dass dies auch ein interessanter Beitrag fuer die biomedizinische Bildgebung sein wird
    corecore