838 research outputs found

    Symbolic and Deep Learning Based Data Representation Methods for Activity Recognition and Image Understanding at Pixel Level

    Get PDF
    Efficient representation of large amount of data particularly images and video helps in the analysis, processing and overall understanding of the data. In this work, we present two frameworks that encapsulate the information present in such data. At first, we present an automated symbolic framework to recognize particular activities in real time from videos. The framework uses regular expressions for symbolically representing (possibly infinite) sets of motion characteristics obtained from a video. It is a uniform framework that handles trajectory-based and periodic articulated activities and provides polynomial time graph algorithms for fast recognition. The regular expressions representing motion characteristics can either be provided manually or learnt automatically from positive and negative examples of strings (that describe dynamic behavior) using offline automata learning frameworks. Confidence measures are associated with recognitions using Levenshtein distance between a string representing a motion signature and the regular expression describing an activity. We have used our framework to recognize trajectory-based activities like vehicle turns (U-turns, left and right turns, and K-turns), vehicle start and stop, person running and walking, and periodic articulated activities like digging, waving, boxing, and clapping in videos from the VIRAT public dataset, the KTH dataset, and a set of videos obtained from YouTube. Next, we present a core sampling framework that is able to use activation maps from several layers of a Convolutional Neural Network (CNN) as features to another neural network using transfer learning to provide an understanding of an input image. The intermediate map responses of a Convolutional Neural Network (CNN) contain information about an image that can be used to extract contextual knowledge about it. Our framework creates a representation that combines features from the test data and the contextual knowledge gained from the responses of a pretrained network, processes it and feeds it to a separate Deep Belief Network. We use this representation to extract more information from an image at the pixel level, hence gaining understanding of the whole image. We experimentally demonstrate the usefulness of our framework using a pretrained VGG-16 model to perform segmentation on the BAERI dataset of Synthetic Aperture Radar (SAR) imagery and the CAMVID dataset. Using this framework, we also reconstruct images by removing noise from noisy character images. The reconstructed images are encoded using Quadtrees. Quadtrees can be an efficient representation in learning from sparse features. When we are dealing with handwritten character images, they are quite susceptible to noise. Hence, preprocessing stages to make the raw data cleaner can improve the efficacy of their use. We improve upon the efficiency of probabilistic quadtrees by using a pixel level classifier to extract the character pixels and remove noise from the images. The pixel level denoiser uses a pretrained CNN trained on a large image dataset and uses transfer learning to aid the reconstruction of characters. In this work, we primarily deal with classification of noisy characters and create the noisy versions of handwritten Bangla Numeral and Basic Character datasets and use them and the Noisy MNIST dataset to demonstrate the usefulness of our approach

    Person re-Identification over distributed spaces and time

    Get PDF
    PhDReplicating the human visual system and cognitive abilities that the brain uses to process the information it receives is an area of substantial scientific interest. With the prevalence of video surveillance cameras a portion of this scientific drive has been into providing useful automated counterparts to human operators. A prominent task in visual surveillance is that of matching people between disjoint camera views, or re-identification. This allows operators to locate people of interest, to track people across cameras and can be used as a precursory step to multi-camera activity analysis. However, due to the contrasting conditions between camera views and their effects on the appearance of people re-identification is a non-trivial task. This thesis proposes solutions for reducing the visual ambiguity in observations of people between camera views This thesis first looks at a method for mitigating the effects on the appearance of people under differing lighting conditions between camera views. This thesis builds on work modelling inter-camera illumination based on known pairs of images. A Cumulative Brightness Transfer Function (CBTF) is proposed to estimate the mapping of colour brightness values based on limited training samples. Unlike previous methods that use a mean-based representation for a set of training samples, the cumulative nature of the CBTF retains colour information from underrepresented samples in the training set. Additionally, the bi-directionality of the mapping function is explored to try and maximise re-identification accuracy by ensuring samples are accurately mapped between cameras. Secondly, an extension is proposed to the CBTF framework that addresses the issue of changing lighting conditions within a single camera. As the CBTF requires manually labelled training samples it is limited to static lighting conditions and is less effective if the lighting changes. This Adaptive CBTF (A-CBTF) differs from previous approaches that either do not consider lighting change over time, or rely on camera transition time information to update. By utilising contextual information drawn from the background in each camera view, an estimation of the lighting change within a single camera can be made. This background lighting model allows the mapping of colour information back to the original training conditions and thus remove the need for 3 retraining. Thirdly, a novel reformulation of re-identification as a ranking problem is proposed. Previous methods use a score based on a direct distance measure of set features to form a correct/incorrect match result. Rather than offering an operator a single outcome, the ranking paradigm is to give the operator a ranked list of possible matches and allow them to make the final decision. By utilising a Support Vector Machine (SVM) ranking method, a weighting on the appearance features can be learned that capitalises on the fact that not all image features are equally important to re-identification. Additionally, an Ensemble-RankSVM is proposed to address scalability issues by separating the training samples into smaller subsets and boosting the trained models. Finally, the thesis looks at a practical application of the ranking paradigm in a real world application. The system encompasses both the re-identification stage and the precursory extraction and tracking stages to form an aid for CCTV operators. Segmentation and detection are combined to extract relevant information from the video, while several combinations of matching techniques are combined with temporal priors to form a more comprehensive overall matching criteria. The effectiveness of the proposed approaches is tested on datasets obtained from a variety of challenging environments including offices, apartment buildings, airports and outdoor public spaces

    Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields

    Full text link
    This work presents a first evaluation of using spatio-temporal receptive fields from a recently proposed time-causal spatio-temporal scale-space framework as primitives for video analysis. We propose a new family of video descriptors based on regional statistics of spatio-temporal receptive field responses and evaluate this approach on the problem of dynamic texture recognition. Our approach generalises a previously used method, based on joint histograms of receptive field responses, from the spatial to the spatio-temporal domain and from object recognition to dynamic texture recognition. The time-recursive formulation enables computationally efficient time-causal recognition. The experimental evaluation demonstrates competitive performance compared to state-of-the-art. Especially, it is shown that binary versions of our dynamic texture descriptors achieve improved performance compared to a large range of similar methods using different primitives either handcrafted or learned from data. Further, our qualitative and quantitative investigation into parameter choices and the use of different sets of receptive fields highlights the robustness and flexibility of our approach. Together, these results support the descriptive power of this family of time-causal spatio-temporal receptive fields, validate our approach for dynamic texture recognition and point towards the possibility of designing a range of video analysis methods based on these new time-causal spatio-temporal primitives.Comment: 29 pages, 16 figure

    Segmentation of images by color features: a survey

    Get PDF
    En este articulo se hace la revisión del estado del arte sobre la segmentación de imagenes de colorImage segmentation is an important stage for object recognition. Many methods have been proposed in the last few years for grayscale and color images. In this paper, we present a deep review of the state of the art on color image segmentation methods; through this paper, we explain the techniques based on edge detection, thresholding, histogram-thresholding, region, feature clustering and neural networks. Because color spaces play a key role in the methods reviewed, we also explain in detail the most commonly color spaces to represent and process colors. In addition, we present some important applications that use the methods of image segmentation reviewed. Finally, a set of metrics frequently used to evaluate quantitatively the segmented images is shown

    Image partial blur detection and classification.

    Get PDF
    Liu, Renting.Thesis (M.Phil.)--Chinese University of Hong Kong, 2008.Includes bibliographical references (leaves 40-46).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 2 --- Related Work and System Overview --- p.6Chapter 2.1 --- Previous Work in Blur Analysis --- p.6Chapter 2.1.1 --- Blur detection and estimation --- p.6Chapter 2.1.2 --- Image deblurring --- p.8Chapter 2.1.3 --- Low DoF image auto-segmentation --- p.14Chapter 2.2 --- System Overview --- p.15Chapter 3 --- Blur Features and Classification --- p.18Chapter 3.1 --- Blur Features --- p.18Chapter 3.1.1 --- Local Power Spectrum Slope --- p.19Chapter 3.1.2 --- Gradient Histogram Span --- p.21Chapter 3.1.3 --- Maximum Saturation --- p.24Chapter 3.1.4 --- Local Autocorrelation Congruency --- p.25Chapter 3.2 --- Classification --- p.28Chapter 4 --- Experiments and Results --- p.29Chapter 4.1 --- Blur Patch Detection --- p.29Chapter 4.2 --- Blur degree --- p.33Chapter 4.3 --- Blur Region Segmentation --- p.34Chapter 5 --- Conclusion and Future Work --- p.38Bibliography --- p.40Chapter A --- Blurred Edge Analysis --- p.4

    Computer Vision Techniques for Ambient Intelligence Applications

    Get PDF
    Ambient Intelligence (AmI) is a muldisciplinary area which refers to environments that are sensitive and responsive to the presence of people and objects. The rapid progress of technology and simultaneous reduction of hardware costs characterizing the recent years have enlarged the number of possible AmI applications, thus raising at the same time new research challenges. In particular, one important requirement in AmI is providing a proactive support to people in their everyday working and free-time activities. To this aim, Computer Vision represents a core research track since only through suitable vision devices and techniques it is possible to detect elements of interest and understand the occurring events. The goal of this thesis is presenting and demonstrating efficacy of novel machine vision research contributes for different AmI scenarios: object keypoints analysis for Augmented Reality purpose, segmentation of natural images for plant species recognition and heterogeneous people identification in unconstrained environments

    Variable illumination and invariant features for detecting and classifying varnish defects

    Get PDF
    This work presents a method to detect and classify varnish defects on wood surfaces. Since these defects are only partially visible under certain illumination directions, one image doesn\u27t provide enough information for a recognition task. A classification requires inspecting the surface under different illumination directions, which results in image series. The information is distributed along this series and can be extracted by merging the knowledge about the defect shape and light direction

    A contribution for single and multiple faces recognition using feature-based approaches

    Get PDF
    Among biometric recognition systems, face biometrics plays an important role in research activities and security applications since face images can be acquired without any knowledge of individuals. Nowadays a huge amount of digital images and video sequences have been acquired mainly from uncontrolled conditions, frequently including noise, blur, occlusion and variation on scale and illumination. Because of these issues, face recognition (FR) is still an active research area and becomes a complex problem and a challenging task. In this context, the motivation comes from the fact that recognition of faces in digital images with complex background and databases of face images have become one of the successful applications of Computer Vision. Hence, the main goal of this work is to recognize one or more faces from still images with multiple faces and from a database of single faces obtained under different conditions. To work with multiple face images under varying conditions, a semi-supervised approach proposed based on the invariant and discriminative power of local features. The extraction of local features is done using Speeded-Up Robust Features (SURF). The search for regions from which optimal features can be extracted is fulfilled by an improved ABC algorithm. To fully exploit the proposed approach, an extensive experimental analysis was performed. Results show that this approach is robust and efficient for face recognition applications except for faces with non-uniform illumination. In the literature, a significant number of single FR researches are based on extraction of only one feature and machine learning approaches. Besides, existing feature extraction approaches broadly use either global or local features. To obtain relevant and complementary features from face images, a face recognition methodology should consider heterogeneous features and semi-global features. Therefore, a novel hierarchical semi-supervised FR approach is proposed based on extraction of global, semi-global and local features. Global and semi-global features are extracted using Color Angles (CA) and edge histogram descriptors (EHD) meanwhile only local features are extracted using SURF. An extensive experimental analysis using the three feature extraction methods was done first individually followed by a three-stage hierarchical scheme using the face images obtained under two different lighting conditions with facial expression and slight scale variation. Furthermore, the performance of the approach was also analyzed using global, semi-global and local features combinations for CA and EHD. The proposed approach achieves high recognition rates considering all image conditions tested in this work. In addition to this, the results emphasize the influence of local and semi-global features in the recognition performance. In both, single face and multiple faces approaches, the main achievement is the high performance obtained only from the discriminative capacity of extracted features without any training schemes.Entre os sistemas de reconhecimento biométrico, a biometria da face exerce um papel importante nas atividades de pesquisa e nas aplicações de segurança, pois a face pode ser obtida sem conhecimento prévio de um indivíduo. Atualmente, uma grande quantidade de imagens digitais e seqüências de vídeo têm sido adquiridas principalmente sob condições não-controladas, freqüentemente com ruído, borramento, oclusão e variação de escala e iluminação. Por esses problemas, o reconhecimento facial (RF) é ainda considerado como uma área de pesquisa ativa e uma tarefa desafiadora. A motivação vem do fato que o reconhecimento de faces nas imagens com fundo complexo e em base de imagens faciais tem sido uma aplicação de sucesso. Portanto, o principal foco deste trabalho é reconhecer uma ou mais faces em imagens estáticas contendo diversos indivíduos e um individuo (face) em uma base de imagens com faces únicas obtidas sob condições diferentes. Para trabalhar com faces múltiplas, uma abordagem semi-supervisionada foi proposta baseada em características locais invariantes e discriminativas. A extração de características (EC) locais é feita utilizando-se do algoritmo Speeded-Up Robust Features (SURF). A busca por regiões nas quais as características ótimas podem ser extraídas é atendida através do algoritmo ABC. Os resultados obtidos mostram que esta abordagem é robusta e eficiente para aplicações de RF exceto para faces com iluminação não-uniforme. Muitos trabalhos de RF são baseados somente na extração de uma característica e nas abordagens de aprendizagem de máquina. Além disso, as abordagens existentes de EC usam características globais e/ou locais. Para obter características relevantes e complementares, a metodologia de RF deve considerar também as características de diferentes tipos e semi-globais. Portanto, a abordagem hierárquica de RF é proposta baseada na EC como globais, semi-globais e locais. As globais e semi-globais são extraídas utilizando-se de Color Angles (CA) e Edge Histogram Descriptors (EHD) enquanto somente características locais são extraídas utilizando-se do SURF. Uma ampla análise experimental foi feita utilizando os três métodos individualmente, seguido por um esquema hierárquico de três - estágios usando imagens faciais obtidas sob duas condições diferentes de iluminação com expressão facial e uma variação de escala leve. Além disso, para CA e EHD, o desempenho da abordagem foi também analisado combinando-se características globais, semi-globais e locais. A abordagem proposta alcança uma taxa de reconhecimento alta com as imagens de todas as condições testadas neste trabalho. Os resultados enfatizam a influência das características locais e semi-globais no desempenho do reconhecimento. Em ambas as abordagens, tanto nas faces únicas quanto nas faces múltiplas, a conquista principal é o alto desempenho obtido somente com a capacidade discriminativa de características sem nenhum esquema de treinamento

    De-Duplication of Person's Identity Using Multi-Modal Biometrics

    Get PDF
    The objective of this work is to explore approaches to create unique identities by the de-duplication process using multi-modal biometrics. Various government sectors in the world provide different services and welfare schemes for the beneffit of the people in the society using an identity number. A unique identity (UID) number assigned for every person would obviate the need for a person to produce multiple documentary proofs of his/her identity for availing any government/private services. In the process of creating unique identity of a person, there is a possibility of duplicate identities as the same person might want to get multiple identities in order to get extra beneffits from the Government. These duplicate identities can be eliminated by the de-duplication process using multi-modal biometrics, namely, iris, ngerprint, face and signature. De-duplication is the process of removing instances of multiple enrollments of the same person using the person's biometric data. As the number of people enrolledinto the biometric system runs into billions, the time complexity increases in the de duplication process. In this thesis, three different case studies are presented to address the performance issues of de-duplication process in order to create unique identity of a person
    corecore