81,017 research outputs found

    Investigation on advanced image search techniques

    Get PDF
    Content-based image search for retrieval of images based on the similarity in their visual contents, such as color, texture, and shape, to a query image is an active research area due to its broad applications. Color, for example, provides powerful information for image search and classification. This dissertation investigates advanced image search techniques and presents new color descriptors for image search and classification and robust image enhancement and segmentation methods for iris recognition. First, several new color descriptors have been developed for color image search. Specifically, a new oRGB-SIFT descriptor, which integrates the oRGB color space and the Scale-Invariant Feature Transform (SIFT), is proposed for image search and classification. The oRGB-SIFT descriptor is further integrated with other color SIFT features to produce the novel Color SIFT Fusion (CSF), the Color Grayscale SIFT Fusion (CGSF), and the CGSF+PHOG descriptors for image category search with applications to biometrics. Image classification is implemented using a novel EFM-KNN classifier, which combines the Enhanced Fisher Model (EFM) and the K Nearest Neighbor (KNN) decision rule. Experimental results on four large scale, grand challenge datasets have shown that the proposed oRGB-SIFT descriptor improves recognition performance upon other color SIFT descriptors, and the CSF, the CGSF, and the CGSF+PHOG descriptors perform better than the other color SIFT descriptors. The fusion of both Color SIFT descriptors (CSF) and Color Grayscale SIFT descriptor (CGSF) shows significant improvement in the classification performance, which indicates that various color-SIFT descriptors and grayscale-SIFT descriptor are not redundant for image search. Second, four novel color Local Binary Pattern (LBP) descriptors are presented for scene image and image texture classification. Specifically, the oRGB-LBP descriptor is derived in the oRGB color space. The other three color LBP descriptors, namely, the Color LBP Fusion (CLF), the Color Grayscale LBP Fusion (CGLF), and the CGLF+PHOG descriptors, are obtained by integrating the oRGB-LBP descriptor with some additional image features. Experimental results on three large scale, grand challenge datasets have shown that the proposed descriptors can improve scene image and image texture classification performance. Finally, a new iris recognition method based on a robust iris segmentation approach is presented for improving iris recognition performance. The proposed robust iris segmentation approach applies power-law transformations for more accurate detection of the pupil region, which significantly reduces the candidate limbic boundary search space for increasing detection accuracy and efficiency. As the limbic circle, which has a center within a close range of the pupil center, is selectively detected, the eyelid detection approach leads to improved iris recognition performance. Experiments using the Iris Challenge Evaluation (ICE) database show the effectiveness of the proposed method

    Interactive training of advanced classifiers for mining remote sensing image archives

    Get PDF
    Advances in satellite technology and availability of down-loaded images constantly increase the sizes of remote sensing image archives. Automatic content extraction, classification and content-based retrieval have become highly desired goals for the development of intelligent remote sensing databases. The common approach for mining these databases uses rules created by analysts. However, incorporating GIS information and human expert knowledge with digital image processing improves remote sensing image analysis. We developed a system that uses decision tree classifiers for interactive learning of land cover models and mining of image archives. Decision trees provide a promising solution for this problem because they can operate on both numerical (continuous) and categorical (discrete) data sources, and they do not require any assumptions about neither the distributions nor the independence of attribute values. This is especially important for the fusion of measurements from different sources like spectral data, DEM data and other ancillary GIS data. Furthermore, using surrogate splits provides the capability of dealing with missing data during both training and classification, and enables handling instrument malfunctions or the cases where one or more measurements do not exist for some locations. Quantitative and qualitative performance evaluation showed that decision trees provide powerful tools for modeling both pixel and region contents of images and mining of remote sensing image archives

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results

    Fusing image representations for classification using support vector machines

    Full text link
    In order to improve classification accuracy different image representations are usually combined. This can be done by using two different fusing schemes. In feature level fusion schemes, image representations are combined before the classification process. In classifier fusion, the decisions taken separately based on individual representations are fused to make a decision. In this paper the main methods derived for both strategies are evaluated. Our experimental results show that classifier fusion performs better. Specifically Bayes belief integration is the best performing strategy for image classification task.Comment: Image and Vision Computing New Zealand, 2009. IVCNZ '09. 24th International Conference, Wellington : Nouvelle-Z\'elande (2009

    Relating visual and semantic image descriptors

    Get PDF
    This paper addresses the automatic analysis of visual content and extraction of metadata beyond pure visual descriptors. Two approaches are described: Automatic Image Annotation (AIA) and Confidence Clustering (CC). AIA attempts to automatically classify images based on two binary classifiers and is designed for the consumer electronics domain. Contrastingly, the CC approach does not attempt to assign a unique label to images but rather to organise the database based on concepts

    Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emotional Analysis in Videos

    Get PDF
    When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation

    Transportation mode recognition fusing wearable motion, sound and vision sensors

    Get PDF
    We present the first work that investigates the potential of improving the performance of transportation mode recognition through fusing multimodal data from wearable sensors: motion, sound and vision. We first train three independent deep neural network (DNN) classifiers, which work with the three types of sensors, respectively. We then propose two schemes that fuse the classification results from the three mono-modal classifiers. The first scheme makes an ensemble decision with fixed rules including Sum, Product, Majority Voting, and Borda Count. The second scheme is an adaptive fuser built as another classifier (including Naive Bayes, Decision Tree, Random Forest and Neural Network) that learns enhanced predictions by combining the outputs from the three mono-modal classifiers. We verify the advantage of the proposed method with the state-of-the-art Sussex-Huawei Locomotion and Transportation (SHL) dataset recognizing the eight transportation activities: Still, Walk, Run, Bike, Bus, Car, Train and Subway. We achieve F1 scores of 79.4%, 82.1% and 72.8% with the mono-modal motion, sound and vision classifiers, respectively. The F1 score is remarkably improved to 94.5% and 95.5% by the two data fusion schemes, respectively. The recognition performance can be further improved with a post-processing scheme that exploits the temporal continuity of transportation. When assessing generalization of the model to unseen data, we show that while performance is reduced - as expected - for each individual classifier, the benefits of fusion are retained with performance improved by 15 percentage points. Besides the actual performance increase, this work, most importantly, opens up the possibility for dynamically fusing modalities to achieve distinct power-performance trade-off at run time
    corecore