12,709 research outputs found

    Neural Networks and the Natural Gradient

    Get PDF
    Neural network training algorithms have always suffered from the problem of local minima. The advent of natural gradient algorithms promised to overcome this shortcoming by finding better local minima. However, they require additional training parameters and computational overhead. By using a new formulation for the natural gradient, an algorithm is described that uses less memory and processing time than previous algorithms with comparable performance

    Multimodal Three Dimensional Scene Reconstruction, The Gaussian Fields Framework

    Get PDF
    The focus of this research is on building 3D representations of real world scenes and objects using different imaging sensors. Primarily range acquisition devices (such as laser scanners and stereo systems) that allow the recovery of 3D geometry, and multi-spectral image sequences including visual and thermal IR images that provide additional scene characteristics. The crucial technical challenge that we addressed is the automatic point-sets registration task. In this context our main contribution is the development of an optimization-based method at the core of which lies a unified criterion that solves simultaneously for the dense point correspondence and transformation recovery problems. The new criterion has a straightforward expression in terms of the datasets and the alignment parameters and was used primarily for 3D rigid registration of point-sets. However it proved also useful for feature-based multimodal image alignment. We derived our method from simple Boolean matching principles by approximation and relaxation. One of the main advantages of the proposed approach, as compared to the widely used class of Iterative Closest Point (ICP) algorithms, is convexity in the neighborhood of the registration parameters and continuous differentiability, allowing for the use of standard gradient-based optimization techniques. Physically the criterion is interpreted in terms of a Gaussian Force Field exerted by one point-set on the other. Such formulation proved useful for controlling and increasing the region of convergence, and hence allowing for more autonomy in correspondence tasks. Furthermore, the criterion can be computed with linear complexity using recently developed Fast Gauss Transform numerical techniques. In addition, we also introduced a new local feature descriptor that was derived from visual saliency principles and which enhanced significantly the performance of the registration algorithm. The resulting technique was subjected to a thorough experimental analysis that highlighted its strength and showed its limitations. Our current applications are in the field of 3D modeling for inspection, surveillance, and biometrics. However, since this matching framework can be applied to any type of data, that can be represented as N-dimensional point-sets, the scope of the method is shown to reach many more pattern analysis applications

    Learning and Production of Movement Sequences: Behavioral, Neurophysiological, and Modeling Perspectives

    Full text link
    A growing wave of behavioral studies, using a wide variety of paradigms that were introduced or greatly refined in recent years, has generated a new wealth of parametric observations about serial order behavior. What was a mere trickle of neurophysiological studies has grown to a more steady stream of probes of neural sites and mechanisms underlying sequential behavior. Moreover, simulation models of serial behavior generation have begun to open a channel to link cellular dynamics with cognitive and behavioral dynamics. Here we summarize the major results from prominent sequence learning and performance tasks, namely immediate serial recall, typing, 2XN, discrete sequence production, and serial reaction time. These populate a continuum from higher to lower degrees of internal control of sequential organization. The main movement classes covered are speech and keypressing, both involving small amplitude movements that are very amenable to parametric study. A brief synopsis of classes of serial order models, vis-à-vis the detailing of major effects found in the behavioral data, leads to a focus on competitive queuing (CQ) models. Recently, the many behavioral predictive successes of CQ models have been joined by successful prediction of distinctively patterend electrophysiological recordings in prefrontal cortex, wherein parallel activation dynamics of multiple neural ensembles strikingly matches the parallel dynamics predicted by CQ theory. An extended CQ simulation model-the N-STREAMS neural network model-is then examined to highlight issues in ongoing attemptes to accomodate a broader range of behavioral and neurophysiological data within a CQ-consistent theory. Important contemporary issues such as the nature of working memory representations for sequential behavior, and the development and role of chunks in hierarchial control are prominent throughout.Defense Advanced Research Projects Agency/Office of Naval Research (N00014-95-1-0409); National Institute of Mental Health (R01 DC02852

    A Novel Feature Maps Covariance Minimization Approach for Advancing Convolutional Neural Network Performance

    Full text link
    We present a method for boosting the performance of the Convolutional Neural Network (CNN) by reducing the covariance between the feature maps of the convolutional layers. In a CNN, the units of a hidden layer are segmented into the feature/activation maps. The units within a feature map share the weight matrix (filter), or in simple terms look for the same feature. A feature map is the output of one filter applied to the previous layer. CNN search for features such as straight lines, and as these features are spotted, they get reported to the feature map. During the learning process, the convolutional neural network defines what it perceives as important. Each feature map is looking for something else: one feature map could be looking for horizontal lines while the other for vertical lines or curves. Reducing the covariance between the feature maps of a convolutional layer maximizes the variance between the feature maps out of that layer. This supplements the decrement in the redundancy of the feature maps and consequently maximizes the information represented by the feature maps

    Face Detection and Recognition using Skin Segmentation and Elastic Bunch Graph Matching

    Get PDF
    Recently, face detection and recognition is attracting a lot of interest in areas such as network security, content indexing and retrieval, and video compression, because ‘people’ are the object of attention in a lot of video or images. To perform such real-time detection and recognition, novel algorithms are needed, which better current efficiencies and speeds. This project is aimed at developing an efficient algorithm for face detection and recognition. This project is divided into two parts, the detection of a face from a complex environment and the subsequent recognition by comparison. For the detection portion, we present an algorithm based on skin segmentation, morphological operators and template matching. The skin segmentation isolates the face-like regions in a complex image and the following operations of morphology and template matching help reject false matches and extract faces from regions containing multiple faces. For the recognition of the face, we have chosen to use the ‘EGBM’ (Elastic Bunch Graph Matching) algorithm. For identifying faces, this system uses single images out of a database having one image per person. The task is complex because of variation in terms of position, size, expression, and pose. The system decreases this variance by extracting face descriptions in the form of image graphs. In this, the node points (chosen as eyes, nose, lips and chin) are described by sets of wavelet components (called ‘jets’). Image graph extraction is based on an approach called the ‘bunch graph’, which is constructed from a set of sample image graphs. Recognition is based on a directly comparing these graphs. The advantage of this method is in its tolerance to lighting conditions and requirement of less number of images per person in the database for comparison

    FacialSCDnet: A deep learning approach for the estimation of subject-to-camera distance in facial photographs

    Get PDF
    Facial biometrics play an essential role in the fields of law enforcement and forensic sciences. When comparing facial traits for human identification in photographs or videos, the analysis must account for several factors that impair the application of common identification techniques, such as illumination, pose, or expression. In particular, facial attributes can drastically change depending on the distance between the subject and the camera at the time of the picture. This effect is known as perspective distortion, which can severely affect the outcome of the comparative analysis. Hence, knowing the subject-to-camera distance of the original scene where the photograph was taken can help determine the degree of distortion, improve the accuracy of computer-aided recognition tools, and increase the reliability of human identification and further analyses. In this paper, we propose a deep learning approach to estimate the subject-to-camera distance of facial photographs: FacialSCDnet. Furthermore, we introduce a novel evaluation metric designed to guide the learning process, based on changes in facial distortion at different distances. To validate our proposal, we collected a novel dataset of facial photographs taken at several distances using both synthetic and real data. Our approach is fully automatic and can provide a numerical distance estimation for up to six meters, beyond which changes in facial distortion are not significant. The proposed method achieves an accurate estimation, with an average error below 6 cm of subject-to-camera distance for facial photographs in any frontal or lateral head pose, robust to facial hair, glasses, and partial occlusion.Departamento de Ciencias de la Computación y Sistemas Inteligente

    FacialSCDnet: A deep learning approach for the estimation of subject-to-camera distance in facial photographs

    Get PDF
    [Abstract]: Facial biometrics play an essential role in the fields of law enforcement and forensic sciences. When comparing facial traits for human identification in photographs or videos, the analysis must account for several factors that impair the application of common identification techniques, such as illumination, pose, or expression. In particular, facial attributes can drastically change depending on the distance between the subject and the camera at the time of the picture. This effect is known as perspective distortion, which can severely affect the outcome of the comparative analysis. Hence, knowing the subject-to-camera distance of the original scene where the photograph was taken can help determine the degree of distortion, improve the accuracy of computer-aided recognition tools, and increase the reliability of human identification and further analyses. In this paper, we propose a deep learning approach to estimate the subject-to-camera distance of facial photographs: FacialSCDnet. Furthermore, we introduce a novel evaluation metric designed to guide the learning process, based on changes in facial distortion at different distances. To validate our proposal, we collected a novel dataset of facial photographs taken at several distances using both synthetic and real data. Our approach is fully automatic and can provide a numerical distance estimation for up to six meters, beyond which changes in facial distortion are not significant. The proposed method achieves an accurate estimation, with an average error below 6 cm of subject-to-camera distance for facial photographs in any frontal or lateral head pose, robust to facial hair, glasses, and partial occlusion

    Robust Specularity Removal from Hand-held Videos

    Get PDF
    Specular reflection exists when one tries to record a photo or video through a transparent glass medium or opaque surfaces such as plastics, ceramics, polyester and human skin, which can be well described as the superposition of a transmitted layer and a reflection layer. These specular reflections often confound the algorithms developed for image analysis, computer vision and pattern recognition. To obtain a pure diffuse reflection component, specularity (highlights) needs to be removed. To handle this problem, a novel and robust algorithm is formulated. The contributions of this work are three-fold.;First, the smoothness of the video along with the temporal coherence and illumination changes are preserved by reducing the flickering and jagged edges caused by hand-held video acquisition and homography transformation respectively.;Second, this algorithm is designed to improve upon the state-of-art algorithms by automatically selecting the region of interest (ROI) for all the frames, reducing the computational time and complexity by utilizing the luminance (Y) channel and exploiting the Augmented Lagrange Multiplier (ALM) with Alternating Direction Minimizing (ADM) to facilitate the derivation of solution algorithms.;Third, a quantity metrics is devised, which objectively quantifies the amount of specularity in each frame of a hand-held video. The proposed specularity removal algorithm is compared against existing state-of-art algorithms using the newly-developed quantity metrics. Experimental results validate that the developed algorithm has superior performance in terms of computation time, quality and accuracy

    Recognition capacity of biometric-based systems

    Get PDF
    Performance of biometrics-based recognition systems depends on various factors: database quality, image preprocessing, encoding techniques, etc. Given a biometric database and a selected encoding method, the capability of a recognition system is limited by the relationship between the number of classes that the recognition system can encode and the length of encoded data describing the template at a specific level of distortion. In this work, we evaluate constrained recognition capacity of biometric systems under the constraint of two global encoding techniques: Principal Component Analysis and Independent Component Analysis. The developed methodology is applied to predict capacity of different recognition channels formed during acquisition of different iris and face databases. The proposed approach relies on data modeling and involves classical detection and information theories. The major contribution is in providing a guideline on how to evaluate capabilities of large-scale biometric recognition systems in practice. Recognition capacity can also be promoted as a global quality measure of biometric databases
    corecore