232 research outputs found

    A Decade of Neural Networks: Practical Applications and Prospects

    Get PDF
    The Jet Propulsion Laboratory Neural Network Workshop, sponsored by NASA and DOD, brings together sponsoring agencies, active researchers, and the user community to formulate a vision for the next decade of neural network research and application prospects. While the speed and computing power of microprocessors continue to grow at an ever-increasing pace, the demand to intelligently and adaptively deal with the complex, fuzzy, and often ill-defined world around us remains to a large extent unaddressed. Powerful, highly parallel computing paradigms such as neural networks promise to have a major impact in addressing these needs. Papers in the workshop proceedings highlight benefits of neural networks in real-world applications compared to conventional computing techniques. Topics include fault diagnosis, pattern recognition, and multiparameter optimization

    Auto-adaptive multi-scale Laplacian Pyramids for modeling non-uniform data

    Full text link
    Kernel-based techniques have become a common way for describing the local and global relationships of data samples that are generated in real-world processes. In this research, we focus on a multi-scale kernel based technique named Auto-adaptive Laplacian Pyramids (ALP). This method can be useful for function approximation and interpolation. ALP is an extension of the standard Laplacian Pyramids model that incorporates a modified Leave-One-Out Cross Validation procedure, which makes the method stable and automatic in terms of parameters selection without extra cost. This paper introduces a new algorithm that extends ALP to fit datasets that are non-uniformly distributed. In particular, the optimal stopping criterion will be point-dependent with respect to the local noise level and the sample rate. Experimental results over real datasets highlight the advantages of the proposed multi-scale technique for modeling and learning complex, high dimensional dataThey wish to thank Prof. Ronald R. Coifman for helpful remarks. They 525 also gratefully acknowledge the use of the facilities of Centro de Computación Científica (CCC) at Universidad Autónoma de Madrid. Funding: This work was supported by Spanish grants of the Ministerio de Ciencia, Innovación y Universidades [grant numbers: TIN2013-42351-P, TIN2015-70308-REDT, TIN2016-76406-P]; project CASI-CAM-CM supported by Madri+d 530 [grant number: S2013/ICE-2845]; project FACIL supported by Fundación BBVA (2016); and the UAM–ADIC Chair for Data Science and Machine Learnin

    Evaluation of face recognition algorithms under noise

    Get PDF
    One of the major applications of computer vision and image processing is face recognition, where a computerized algorithm automatically identifies a person’s face from a large image dataset or even from a live video. This thesis addresses facial recognition, a topic that has been widely studied due to its importance in many applications in both civilian and military domains. The application of face recognition systems has expanded from security purposes to social networking sites, managing fraud, and improving user experience. Numerous algorithms have been designed to perform face recognition with good accuracy. This problem is challenging due to the dynamic nature of the human face and the different poses that it can take. Regardless of the algorithm, facial recognition accuracy can be heavily affected by the presence of noise. This thesis presents a comparison of traditional and deep learning face recognition algorithms under the presence of noise. For this purpose, Gaussian and salt-andpepper noises are applied to the face images drawn from the ORL Dataset. The image recognition is performed using each of the following eight algorithms: principal component analysis (PCA), two-dimensional PCA (2D-PCA), linear discriminant analysis (LDA), independent component analysis (ICA), discrete cosine transform (DCT), support vector machine (SVM), convolution neural network (CNN) and Alex Net. The ORL dataset was used in the experiments to calculate the evaluation accuracy for each of the investigated algorithms. Each algorithm is evaluated with two experiments; in the first experiment only one image per person is used for training, whereas in the second experiment, five images per person are used for training. The investigated traditional algorithms are implemented with MATLAB and the deep learning algorithms approaches are implemented with Python. The results show that the best performance was obtained using the DCT algorithm with 92% dominant eigenvalues and 95.25 % accuracy, whereas for deep learning, the best performance was using a CNN with accuracy of 97.95%, which makes it the best choice under noisy conditions

    Efficient Point-Cloud Processing with Primitive Shapes

    Get PDF
    This thesis presents methods for efficient processing of point-clouds based on primitive shapes. The set of considered simple parametric shapes consists of planes, spheres, cylinders, cones and tori. The algorithms developed in this work are targeted at scenarios in which the occurring surfaces can be well represented by this set of shape primitives which is the case in many man-made environments such as e.g. industrial compounds, cities or building interiors. A primitive subsumes a set of corresponding points in the point-cloud and serves as a proxy for them. Therefore primitives are well suited to directly address the unavoidable oversampling of large point-clouds and lay the foundation for efficient point-cloud processing algorithms. The first contribution of this thesis is a novel shape primitive detection method that is efficient even on very large and noisy point-clouds. Several applications for the detected primitives are subsequently explored, resulting in a set of novel algorithms for primitive-based point-cloud processing in the areas of compression, recognition and completion. Each of these application directly exploits and benefits from one or more of the detected primitives' properties such as approximation, abstraction, segmentation and continuability

    From Pixels to Spikes: Efficient Multimodal Learning in the Presence of Domain Shift

    Get PDF
    Computer vision aims to provide computers with a conceptual understanding of images or video by learning a high-level representation. This representation is typically derived from the pixel domain (i.e., RGB channels) for tasks such as image classification or action recognition. In this thesis, we explore how RGB inputs can either be pre-processed or supplemented with other compressed visual modalities, in order to improve the accuracy-complexity tradeoff for various computer vision tasks. Beginning with RGB-domain data only, we propose a multi-level, Voronoi based spatial partitioning of images, which are individually processed by a convolutional neural network (CNN), to improve the scale invariance of the embedding. We combine this with a novel and efficient approach for optimal bit allocation within the quantized cell representations. We evaluate this proposal on the content-based image retrieval task, which constitutes finding similar images in a dataset to a given query. We then move to the more challenging domain of action recognition, where a video sequence is classified according to its constituent action. In this case, we demonstrate how the RGB modality can be supplemented with a flow modality, comprising motion vectors extracted directly from the video codec. The motion vectors (MVs) are used both as input to a CNN and as an activity sensor for providing selective macroblock (MB) decoding of RGB frames instead of full-frame decoding. We independently train two CNNs on RGB and MV correspondences and then fuse their scores during inference, demonstrating faster end-to-end processing and competitive classification accuracy to recent work. In order to explore the use of more efficient sensing modalities, we replace the MV stream with a neuromorphic vision sensing (NVS) stream for action recognition. NVS hardware mimics the biological retina and operates with substantially lower power and at significantly higher sampling rates than conventional active pixel sensing (APS) cameras. Due to the lack of training data in this domain, we generate emulated NVS frames directly from consecutive RGB frames and use these to train a teacher-student framework that additionally leverages on the abundance of optical flow training data. In the final part of this thesis, we introduce a novel unsupervised domain adaptation method for further minimizing the domain shift between emulated (source) and real (target) NVS data domains

    Hyperspectral Image Unmixing Incorporating Adjacency Information

    Get PDF
    While the spectral information contained in hyperspectral images is rich, the spatial resolution of such images is in many cases very low. Many pixel spectra are mixtures of pure materials’ spectra and therefore need to be decomposed into their constituents. This work investigates new decomposition methods taking into account spectral, spatial and global 3D adjacency information. This allows for faster and more accurate decomposition results

    Kernel learning approaches for image classification

    Get PDF
    This thesis extends the use of kernel learning techniques to specific problems of image classification. Kernel learning is a paradigm in the eld of machine learning that generalizes the use of inner products to compute similarities between arbitrary objects. In image classification one aims to separate images based on their visual content. We address two important problems that arise in this context: learning with weak label information and combination of heterogeneous data sources. The contributions we report on are not unique to image classification, and apply to a more general class of problems. We study the problem of learning with label ambiguity in the multiple instance learning framework. We discuss several different image classification scenarios that arise in this context and argue that the standard multiple instance learning requires a more detailed disambiguation. Finally we review kernel learning approaches proposed for this problem and derive a more efficcient algorithm to solve them. The multiple kernel learning framework is an approach to automatically select kernel parameters. We extend it to its infinite limit and present an algorithm to solve the resulting problem. This result is then applied in two directions. We show how to learn kernels that adapt to the special structure of images. Finally we compare different ways of combining image features for object classification and present significant improvements compared to previous methods.In dieser Dissertation verwenden wir Kernmethoden für spezielle Probleme der Bildklassifikation. Kernmethoden generalisieren die Verwendung von inneren Produkten zu Distanzen zwischen allgemeinen Objekten. Das Problem der Bildklassifikation ist es, Bilder anhand des visuellen Inhaltes zu unterscheiden. Wir beschäftigen uns mit zwei wichtigen Aspekten, die in diesem Problem auftreten: lernen mit mehrdeutiger Annotation und die Kombination von verschiedenartigen Datenquellen. Unsere Ansätze sind nicht auf die Bildklassififikation beschränkt und für einen grösseren Problemkreis verwendbar. Mehrdeutige Annotationen sind ein inhärentes Problem der Bildklassifikation. Wir diskutieren verschiedene Instanzen und schlagen eine neue Unterteilung in mehrere Szenarien vor. Danach stellen wir Kernmethoden für dieses Problem vor und entwickeln einen Algorithmus, der diese effizient löst. Mit der Methode der Kernkombination werden Kernfunktionen anhand von Daten automatisch bestimmt. Wir generalisieren diesen Ansatz indem wir den Suchraum auf kontinuierlich parametrisierte Kernklassen ausgedehnen. Diese Methode wird in zwei verschiedenen Anwendungen eingesetzt. Wir betrachten spezifische Kerne für Bilddaten und lernen diese anhand von Beispielen. Schließlich vergleichen wir verschiedene Verfahren der Merkmalskombination und zeigen signifikante Verbesserungen im Bereich der Objekterkennung gegenüber bestehenden Methoden

    Acoustic Features for Environmental Sound Analysis

    Get PDF
    International audienceMost of the time it is nearly impossible to differentiate between particular type of sound events from a waveform only. Therefore, frequency domain and time-frequency domain representations have been used for years providing representations of the sound signals that are more inline with the human perception. However, these representations are usually too generic and often fail to describe specific content that is present in a sound recording. A lot of work have been devoted to design features that could allow extracting such specific information leading to a wide variety of hand-crafted features. During the past years, owing to the increasing availability of medium scale and large scale sound datasets, an alternative approach to feature extraction has become popular, the so-called feature learning. Finally, processing the amount of data that is at hand nowadays can quickly become overwhelming. It is therefore of paramount importance to be able to reduce the size of the dataset in the feature space. The general processing chain to convert an sound signal to a feature vector that can be efficiently exploited by a classifier and the relation to features used for speech and music processing are described is this chapter
    corecore