55 research outputs found

    Representation Learning in Sensory Cortex: a theory

    Get PDF
    We review and apply a computational theory of the feedforward path of the ventral stream in visual cortex based on the hypothesis that its main function is the encoding of invariant representations of images. A key justification of the theory is provided by a theorem linking invariant representations to small sample complexity for recognition – that is, invariant representations allows learning from very few labeled examples. The theory characterizes how an algorithm that can be implemented by a set of ”simple” and ”complex” cells – a ”HW module” – provides invariant and selective representations. The invariance can be learned in an unsupervised way from observed transformations. Theorems show that invariance implies several properties of the ventral stream organization, including the eccentricity dependent lattice of units in the retina and in V1, and the tuning of its neurons. The theory requires two stages of processing: the first, consisting of retinotopic visual areas such as V1, V2 and V4 with generic neuronal tuning, leads to representations that are invariant to translation and scaling; the second, consisting of modules in IT, with class- and object-specific tuning, provides a representation for recognition with approximate invariance to class specific transformations, such as pose (of a body, of a face) and expression. In the theory the ventral stream main function is the unsupervised learning of ”good” representations that reduce the sample complexity of the final supervised learning stage.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216

    Improved Human Face Recognition by Introducing a New Cnn Arrangement and Hierarchical Method

    Get PDF
    Human face recognition has become one of the most attractive topics in the fields ‎of biometrics due to its wide applications. The face is a part of the body that carries ‎the most information regarding identification in human interactions. Features such ‎as the composition of facial components, skin tone, face\u27s central axis, distances ‎between eyes, and many more, alongside the other biometrics, are used ‎unconsciously by the brain to distinguish a person. Indeed, analyzing the facial ‎features could be the first method humans use to identify a person in their lives. ‎As one of the main biometric measures, human face recognition has been utilized in ‎various commercial applications over the past two decades. From banking to smart ‎advertisement and from border security to mobile applications. These are a few ‎examples that show us how far these methods have come. We can confidently say ‎that the techniques for face recognition have reached an acceptable level of ‎accuracy to be implemented in some real-life applications. However, there are other ‎applications that could benefit from improvement. Given the increasing demand ‎for the topic and the fact that nowadays, we have almost all the infrastructure that ‎we might need for our application, make face recognition an appealing topic. ‎ When we are evaluating the quality of a face recognition method, there are some ‎benchmarks that we should consider: accuracy, speed, and complexity are the main ‎parameters. Of course, we can measure other aspects of the algorithm, such as size, ‎precision, cost, etc. But eventually, every one of those parameters will contribute to ‎improving one or some of these three concepts of the method. Then again, although ‎we can see a significant level of accuracy in existing algorithms, there is still much ‎room for improvement in speed and complexity. In addition, the accuracy of the ‎mentioned methods highly depends on the properties of the face images. In other ‎words, uncontrolled situations and variables like head pose, occlusion, lighting, ‎image noise, etc., can affect the results dramatically. ‎ Human face recognition systems are used in either identification or verification. In ‎verification, the system\u27s main goal is to check if an input belongs to a pre-determined tag or a person\u27s ID. ‎Almost every face recognition system consists of four major steps. These steps are ‎pre-processing, face detection, feature extraction, and classification. Improvement ‎in each of these steps will lead to the overall enhancement of the system. In this ‎work, the main objective is to propose new, improved and enhanced methods in ‎each of those mentioned steps, evaluate the results by comparing them with other ‎existing techniques and investigate the outcome of the proposed system.

    Non-Decimated Wavelet based Multi-Band Ear Recognition using Principal Component Analysis

    Get PDF
    Principal Component Analysis (PCA) has been successfully applied to many applications, including ear recognition. This paper presents a 2D Wavelet based Multi-Band Principal Component Analysis (2D-WMBPCA) ear recognition method, inspired by PCA based techniques for multispectral and hyperspectral images. The proposed 2D-WMBPCA method performs a 2D non-decimated wavelet transform on the input image, dividing it into its wavelet subbands. Each resulting subband is then divided into a number of frames based on its coefficient’s values. The multi frame generation boundaries are calculated using either equal size or greedy hill climbing techniques. Conventional PCA is applied on each subband’s resulting frames, yielding its eigenvectors, which are used for matching. The intersection of the energy of the eigenvectors and the total number of features for each subband shows the number of bands which yield the highest matching performance. Experimental results on the images of two benchmark ear datasets, called IITD II and USTB I, demonstrated that the proposed 2D-WMBPCA technique significantly outperforms Single Image PCA by up to 56.79% and the eigenfaces technique by up to 20.37% with respect to matching accuracy. Furthermore, the proposed technique achieves very competitive results to those of learning based techniques at a fraction of their computational time and without needing to be trained

    Low-Shot Learning for the Semantic Segmentation of Remote Sensing Imagery

    Get PDF
    Deep-learning frameworks have made remarkable progress thanks to the creation of large annotated datasets such as ImageNet, which has over one million training images. Although this works well for color (RGB) imagery, labeled datasets for other sensor modalities (e.g., multispectral and hyperspectral) are minuscule in comparison. This is because annotated datasets are expensive and man-power intensive to complete; and since this would be impractical to accomplish for each type of sensor, current state-of-the-art approaches in computer vision are not ideal for remote sensing problems. The shortage of annotated remote sensing imagery beyond the visual spectrum has forced researchers to embrace unsupervised feature extracting frameworks. These features are learned on a per-image basis, so they tend to not generalize well across other datasets. In this dissertation, we propose three new strategies for learning feature extracting frameworks with only a small quantity of annotated image data; including 1) self-taught feature learning, 2) domain adaptation with synthetic imagery, and 3) semi-supervised classification. ``Self-taught\u27\u27 feature learning frameworks are trained with large quantities of unlabeled imagery, and then these networks extract spatial-spectral features from annotated data for supervised classification. Synthetic remote sensing imagery can be used to boot-strap a deep convolutional neural network, and then we can fine-tune the network with real imagery. Semi-supervised classifiers prevent overfitting by jointly optimizing the supervised classification task along side one or more unsupervised learning tasks (i.e., reconstruction). Although obtaining large quantities of annotated image data would be ideal, our work shows that we can make due with less cost-prohibitive methods which are more practical to the end-user

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches

    High Performance Video Stream Analytics System for Object Detection and Classification

    Get PDF
    Due to the recent advances in cameras, cell phones and camcorders, particularly the resolution at which they can record an image/video, large amounts of data are generated daily. This video data is often so large that manually inspecting it for object detection and classification can be time consuming and error prone, thereby it requires automated analysis to extract useful information and meta-data. The automated analysis from video streams also comes with numerous challenges such as blur content and variation in illumination conditions and poses. We investigate an automated video analytics system in this thesis which takes into account the characteristics from both shallow and deep learning domains. We propose fusion of features from spatial frequency domain to perform highly accurate blur and illumination invariant object classification using deep learning networks. We also propose the tuning of hyper-parameters associated with the deep learning network through a mathematical model. The mathematical model used to support hyper-parameter tuning improved the performance of the proposed system during training. The outcomes of various hyper-parameters on system's performance are compared. The parameters that contribute towards the most optimal performance are selected for the video object classification. The proposed video analytics system has been demonstrated to process a large number of video streams and the underlying infrastructure is able to scale based on the number and size of the video stream(s) being processed. The extensive experimentation on publicly available image and video datasets reveal that the proposed system is significantly more accurate and scalable and can be used as a general purpose video analytics system.N/

    View-based models for visual tracking and recognition

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Automatic Segmentation and Classification of Red and White Blood cells in Thin Blood Smear Slides

    Get PDF
    In this work we develop a system for automatic detection and classification of cytological images which plays an increasing important role in medical diagnosis. A primary aim of this work is the accurate segmentation of cytological images of blood smears and subsequent feature extraction, along with studying related classification problems such as the identification and counting of peripheral blood smear particles, and classification of white blood cell into types five. Our proposed approach benefits from powerful image processing techniques to perform complete blood count (CBC) without human intervention. The general framework in this blood smear analysis research is as follows. Firstly, a digital blood smear image is de-noised using optimized Bayesian non-local means filter to design a dependable cell counting system that may be used under different image capture conditions. Then an edge preservation technique with Kuwahara filter is used to recover degraded and blurred white blood cell boundaries in blood smear images while reducing the residual negative effect of noise in images. After denoising and edge enhancement, the next step is binarization using combination of Otsu and Niblack to separate the cells and stained background. Cells separation and counting is achieved by granulometry, advanced active contours without edges, and morphological operators with watershed algorithm. Following this is the recognition of different types of white blood cells (WBCs), and also red blood cells (RBCs) segmentation. Using three main types of features: shape, intensity, and texture invariant features in combination with a variety of classifiers is next step. The following features are used in this work: intensity histogram features, invariant moments, the relative area, co-occurrence and run-length matrices, dual tree complex wavelet transform features, Haralick and Tamura features. Next, different statistical approaches involving correlation, distribution and redundancy are used to measure of the dependency between a set of features and to select feature variables on the white blood cell classification. A global sensitivity analysis with random sampling-high dimensional model representation (RS-HDMR) which can deal with independent and dependent input feature variables is used to assess dominate discriminatory power and the reliability of feature which leads to an efficient feature selection. These feature selection results are compared in experiments with branch and bound method and with sequential forward selection (SFS), respectively. This work examines support vector machine (SVM) and Convolutional Neural Networks (LeNet5) in connection with white blood cell classification. Finally, white blood cell classification system is validated in experiments conducted on cytological images of normal poor quality blood smears. These experimental results are also assessed with ground truth manually obtained from medical experts
    corecore