78,807 research outputs found

    Data fusion by using machine learning and computational intelligence techniques for medical image analysis and classification

    Get PDF
    Data fusion is the process of integrating information from multiple sources to produce specific, comprehensive, unified data about an entity. Data fusion is categorized as low level, feature level and decision level. This research is focused on both investigating and developing feature- and decision-level data fusion for automated image analysis and classification. The common procedure for solving these problems can be described as: 1) process image for region of interest\u27 detection, 2) extract features from the region of interest and 3) create learning model based on the feature data. Image processing techniques were performed using edge detection, a histogram threshold and a color drop algorithm to determine the region of interest. The extracted features were low-level features, including textual, color and symmetrical features. For image analysis and classification, feature- and decision-level data fusion techniques are investigated for model learning using and integrating computational intelligence and machine learning techniques. These techniques include artificial neural networks, evolutionary algorithms, particle swarm optimization, decision tree, clustering algorithms, fuzzy logic inference, and voting algorithms. This work presents both the investigation and development of data fusion techniques for the application areas of dermoscopy skin lesion discrimination, content-based image retrieval, and graphic image type classification --Abstract, page v

    Interactive training of advanced classifiers for mining remote sensing image archives

    Get PDF
    Advances in satellite technology and availability of down-loaded images constantly increase the sizes of remote sensing image archives. Automatic content extraction, classification and content-based retrieval have become highly desired goals for the development of intelligent remote sensing databases. The common approach for mining these databases uses rules created by analysts. However, incorporating GIS information and human expert knowledge with digital image processing improves remote sensing image analysis. We developed a system that uses decision tree classifiers for interactive learning of land cover models and mining of image archives. Decision trees provide a promising solution for this problem because they can operate on both numerical (continuous) and categorical (discrete) data sources, and they do not require any assumptions about neither the distributions nor the independence of attribute values. This is especially important for the fusion of measurements from different sources like spectral data, DEM data and other ancillary GIS data. Furthermore, using surrogate splits provides the capability of dealing with missing data during both training and classification, and enables handling instrument malfunctions or the cases where one or more measurements do not exist for some locations. Quantitative and qualitative performance evaluation showed that decision trees provide powerful tools for modeling both pixel and region contents of images and mining of remote sensing image archives

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results

    Fusing image representations for classification using support vector machines

    Full text link
    In order to improve classification accuracy different image representations are usually combined. This can be done by using two different fusing schemes. In feature level fusion schemes, image representations are combined before the classification process. In classifier fusion, the decisions taken separately based on individual representations are fused to make a decision. In this paper the main methods derived for both strategies are evaluated. Our experimental results show that classifier fusion performs better. Specifically Bayes belief integration is the best performing strategy for image classification task.Comment: Image and Vision Computing New Zealand, 2009. IVCNZ '09. 24th International Conference, Wellington : Nouvelle-Z\'elande (2009

    Transportation mode recognition fusing wearable motion, sound and vision sensors

    Get PDF
    We present the first work that investigates the potential of improving the performance of transportation mode recognition through fusing multimodal data from wearable sensors: motion, sound and vision. We first train three independent deep neural network (DNN) classifiers, which work with the three types of sensors, respectively. We then propose two schemes that fuse the classification results from the three mono-modal classifiers. The first scheme makes an ensemble decision with fixed rules including Sum, Product, Majority Voting, and Borda Count. The second scheme is an adaptive fuser built as another classifier (including Naive Bayes, Decision Tree, Random Forest and Neural Network) that learns enhanced predictions by combining the outputs from the three mono-modal classifiers. We verify the advantage of the proposed method with the state-of-the-art Sussex-Huawei Locomotion and Transportation (SHL) dataset recognizing the eight transportation activities: Still, Walk, Run, Bike, Bus, Car, Train and Subway. We achieve F1 scores of 79.4%, 82.1% and 72.8% with the mono-modal motion, sound and vision classifiers, respectively. The F1 score is remarkably improved to 94.5% and 95.5% by the two data fusion schemes, respectively. The recognition performance can be further improved with a post-processing scheme that exploits the temporal continuity of transportation. When assessing generalization of the model to unseen data, we show that while performance is reduced - as expected - for each individual classifier, the benefits of fusion are retained with performance improved by 15 percentage points. Besides the actual performance increase, this work, most importantly, opens up the possibility for dynamically fusing modalities to achieve distinct power-performance trade-off at run time

    Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emotional Analysis in Videos

    Get PDF
    When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation

    Biometric Authentication System on Mobile Personal Devices

    Get PDF
    We propose a secure, robust, and low-cost biometric authentication system on the mobile personal device for the personal network. The system consists of the following five key modules: 1) face detection; 2) face registration; 3) illumination normalization; 4) face verification; and 5) information fusion. For the complicated face authentication task on the devices with limited resources, the emphasis is largely on the reliability and applicability of the system. Both theoretical and practical considerations are taken. The final system is able to achieve an equal error rate of 2% under challenging testing protocols. The low hardware and software cost makes the system well adaptable to a large range of security applications
    corecore