60,927 research outputs found

    Cultural Event Recognition with Visual ConvNets and Temporal Models

    Get PDF
    This paper presents our contribution to the ChaLearn Challenge 2015 on Cultural Event Classification. The challenge in this task is to automatically classify images from 50 different cultural events. Our solution is based on the combination of visual features extracted from convolutional neural networks with temporal information using a hierarchical classifier scheme. We extract visual features from the last three fully connected layers of both CaffeNet (pretrained with ImageNet) and our fine tuned version for the ChaLearn challenge. We propose a late fusion strategy that trains a separate low-level SVM on each of the extracted neural codes. The class predictions of the low-level SVMs form the input to a higher level SVM, which gives the final event scores. We achieve our best result by adding a temporal refinement step into our classification scheme, which is applied directly to the output of each low-level SVM. Our approach penalizes high classification scores based on visual features when their time stamp does not match well an event-specific temporal distribution learned from the training and validation data. Our system achieved the second best result in the ChaLearn Challenge 2015 on Cultural Event Classification with a mean average precision of 0.767 on the test set.Comment: Initial version of the paper accepted at the CVPR Workshop ChaLearn Looking at People 201

    Learning to Select Pre-Trained Deep Representations with Bayesian Evidence Framework

    Full text link
    We propose a Bayesian evidence framework to facilitate transfer learning from pre-trained deep convolutional neural networks (CNNs). Our framework is formulated on top of a least squares SVM (LS-SVM) classifier, which is simple and fast in both training and testing, and achieves competitive performance in practice. The regularization parameters in LS-SVM is estimated automatically without grid search and cross-validation by maximizing evidence, which is a useful measure to select the best performing CNN out of multiple candidates for transfer learning; the evidence is optimized efficiently by employing Aitken's delta-squared process, which accelerates convergence of fixed point update. The proposed Bayesian evidence framework also provides a good solution to identify the best ensemble of heterogeneous CNNs through a greedy algorithm. Our Bayesian evidence framework for transfer learning is tested on 12 visual recognition datasets and illustrates the state-of-the-art performance consistently in terms of prediction accuracy and modeling efficiency.Comment: Appearing in CVPR-2016 (oral presentation

    Automated detection of block falls in the north polar region of Mars

    Full text link
    We developed a change detection method for the identification of ice block falls using NASA's HiRISE images of the north polar scarps on Mars. Our method is based on a Support Vector Machine (SVM), trained using Histograms of Oriented Gradients (HOG), and on blob detection. The SVM detects potential new blocks between a set of images; the blob detection, then, confirms the identification of a block inside the area indicated by the SVM and derives the shape of the block. The results from the automatic analysis were compared with block statistics from visual inspection. We tested our method in 6 areas consisting of 1000x1000 pixels, where several hundreds of blocks were identified. The results for the given test areas produced a true positive rate of ~75% for blocks with sizes larger than 0.7 m (i.e., approx. 3 times the available ground pixel size) and a false discovery rate of ~8.5%. Using blob detection we also recover the size of each block within 3 pixels of their actual size

    Visual Concepts and Compositional Voting

    Get PDF
    It is very attractive to formulate vision in terms of pattern theory \cite{Mumford2010pattern}, where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is currently less successful than discriminative methods such as deep networks. Deep networks, however, are black-boxes which are hard to interpret and can easily be fooled by adding occluding objects. It is natural to wonder whether by better understanding deep networks we can extract building blocks which can be used to develop pattern theoretic models. This motivates us to study the internal representations of a deep network using vehicle images from the PASCAL3D+ dataset. We use clustering algorithms to study the population activities of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of vehicles. To analyze this we annotate these vehicles by their semantic parts to create a new dataset, VehicleSemanticParts, and evaluate visual concepts as unsupervised part detectors. We show that visual concepts perform fairly well but are outperformed by supervised discriminative methods such as Support Vector Machines (SVM). We next give a more detailed analysis of visual concepts and how they relate to semantic parts. Following this, we use the visual concepts as building blocks for a simple pattern theoretical model, which we call compositional voting. In this model several visual concepts combine to detect semantic parts. We show that this approach is significantly better than discriminative methods like SVM and deep networks trained specifically for semantic part detection. Finally, we return to studying occlusion by creating an annotated dataset with occlusion, called VehicleOcclusion, and show that compositional voting outperforms even deep networks when the amount of occlusion becomes large.Comment: It is accepted by Annals of Mathematical Sciences and Application

    Detection of the stroboscopic effect by young adults varying in sensitivity

    Get PDF
    The advent of LED lighting has renewed concern about the possible visual, neurobiological, and performance and cognition effects of cyclic variations in lighting system luminous flux (temporal light modulation). The stroboscopic visibility measure (SVM) characterises the temporal light modulation signal to predict the visibility of the stroboscopic effect, one of the visual perception effects of temporal light modulation. A SVM of 1 means that the average person would detect the phenomenon 50% of the time. There is little published data describing the population sensitivity to the stroboscopic effect in relation to the SVM, and none focusing on people subject to visual stress. This experiment, conducted in parallel in Canada and France, examined stroboscopic detection for horizontal and vertical moving targets when viewed under commercially available lamps varying in SVM conditions (SVM: ∼0; ∼0.4; ∼0.9; ∼1.4; ∼3.0). As expected, stroboscopic detection scores increased with increasing SVM. For the horizontal task, average scores were lower than the expected 4/8 at ∼0.90, but increased non-linearly with higher SVMs. Stroboscopic detection scores did not differ between people low and high in pattern glare sensitivity, but people in the high-pattern glare sensitivity group reported greater annoyance in the SVM ∼1.4 and ∼3.0 conditions

    Application of support vector machine for classification of multispectral data

    Get PDF
    In this paper, support vector machine (SVM) is used to classify satellite remotely sensed multispectral data. The data are recorded from a Landsat-5 TM satellite with resolution of 30x30m. SVM finds the optimal separating hyperplane between classes by focusing on the training cases. The study area of Klang Valley has more than 10 land covers and classification using SVM has been done successfully without any pixel being unclassified. The training area is determined carefully by visual interpretation and with the aid of the reference map of the study area. The result obtained is then analysed for the accuracy and visual performance. Accuracy assessment is done by determination and discussion of Kappa coefficient value, overall and producer accuracy for each class (in pixels and percentage). While, visual analysis is done by comparing the classification data with the reference map. Overall the study shows that SVM is able to classify the land covers within the study area with a high accuracy
    corecore