3,932 research outputs found

    Track Everything: Limiting Prior Knowledge in Online Multi-Object Recognition

    Full text link
    This paper addresses the problem of online tracking and classification of multiple objects in an image sequence. Our proposed solution is to first track all objects in the scene without relying on object-specific prior knowledge, which in other systems can take the form of hand-crafted features or user-based track initialization. We then classify the tracked objects with a fast-learning image classifier that is based on a shallow convolutional neural network architecture and demonstrate that object recognition improves when this is combined with object state information from the tracking algorithm. We argue that by transferring the use of prior knowledge from the detection and tracking stages to the classification stage we can design a robust, general purpose object recognition system with the ability to detect and track a variety of object types. We describe our biologically inspired implementation, which adaptively learns the shape and motion of tracked objects, and apply it to the Neovision2 Tower benchmark data set, which contains multiple object types. An experimental evaluation demonstrates that our approach is competitive with state-of-the-art video object recognition systems that do make use of object-specific prior knowledge in detection and tracking, while providing additional practical advantages by virtue of its generality.Comment: 15 page

    Person Search via A Mask-Guided Two-Stream CNN Model

    Full text link
    In this work, we tackle the problem of person search, which is a challenging task consisted of pedestrian detection and person re-identification~(re-ID). Instead of sharing representations in a single joint model, we find that separating detector and re-ID feature extraction yields better performance. In order to extract more representative features for each identity, we segment out the foreground person from the original image patch. We propose a simple yet effective re-ID method, which models foreground person and original image patches individually, and obtains enriched representations from two separate CNN streams. From the experiments on two standard person search benchmarks of CUHK-SYSU and PRW, we achieve mAP of 83.0%83.0\% and 32.6%32.6\% respectively, surpassing the state of the art by a large margin (more than 5pp).Comment: accepted as poster to ECCV 201

    Detection, Recognition and Tracking of Moving Objects from Real-time Video via Visual Vocabulary Model and Species Inspired PSO

    Full text link
    In this paper, we address the basic problem of recognizing moving objects in video images using Visual Vocabulary model and Bag of Words and track our object of interest in the subsequent video frames using species inspired PSO. Initially, the shadow free images are obtained by background modelling followed by foreground modeling to extract the blobs of our object of interest. Subsequently, we train a cubic SVM with human body datasets in accordance with our domain of interest for recognition and tracking. During training, using the principle of Bag of Words we extract necessary features of certain domains and objects for classification. Subsequently, matching these feature sets with those of the extracted object blobs that are obtained by subtracting the shadow free background from the foreground, we detect successfully our object of interest from the test domain. The performance of the classification by cubic SVM is satisfactorily represented by confusion matrix and ROC curve reflecting the accuracy of each module. After classification, our object of interest is tracked in the test domain using species inspired PSO. By combining the adaptive learning tools with the efficient classification of description, we achieve optimum accuracy in recognition of the moving objects. We evaluate our algorithm benchmark datasets: iLIDS, VIVID, Walking2, Woman. Comparative analysis of our algorithm against the existing state-of-the-art trackers shows very satisfactory and competitive results

    Facial Expression Recognition in the Wild using Rich Deep Features

    Full text link
    Facial Expression Recognition is an active area of research in computer vision with a wide range of applications. Several approaches have been developed to solve this problem for different benchmark datasets. However, Facial Expression Recognition in the wild remains an area where much work is still needed to serve real-world applications. To this end, in this paper we present a novel approach towards facial expression recognition. We fuse rich deep features with domain knowledge through encoding discriminant facial patches. We conduct experiments on two of the most popular benchmark datasets; CK and TFE. Moreover, we present a novel dataset that, unlike its precedents, consists of natural - not acted - expression images. Experimental results show that our approach achieves state-of-the-art results over standard benchmarks and our own datasetComment: in International Conference in Image Processing, 201

    A review on handwritten character and numeral recognition for Roman, Arabic, Chinese and Indian scripts

    Full text link
    There are a lot of intensive researches on handwritten character recognition (HCR) for almost past four decades. The research has been done on some of popular scripts such as Roman, Arabic, Chinese and Indian. In this paper we present a review on HCR work on the four popular scripts. We have summarized most of the published paper from 2005 to recent and also analyzed the various methods in creating a robust HCR system. We also added some future direction of research on HCR.Comment: 8 page

    A Hajj And Umrah Location Classification System For Video Crowded Scenes

    Full text link
    In this paper, a new automatic system for classifying ritual locations in diverse Hajj and Umrah video scenes is investigated. This challenging subject has mostly been ignored in the past due to several problems one of which is the lack of realistic annotated video datasets. HUER Dataset is defined to model six different Hajj and Umrah ritual locations[26]. The proposed Hajj and Umrah ritual location classifying system consists of four main phases: Preprocessing, segmentation, feature extraction, and location classification phases. The shot boundary detection and background/foregroud segmentation algorithms are applied to prepare the input video scenes into the KNN, ANN, and SVM classifiers. The system improves the state of art results on Hajj and Umrah location classifications, and successfully recognizes the six Hajj rituals with more than 90% accuracy. The various demonstrated experiments show the promising results.Comment: 9 pages, 10 figures, 2 tables, 3 algirthm

    Indic Handwritten Script Identification using Offline-Online Multimodal Deep Network

    Full text link
    In this paper, we propose a novel approach of word-level Indic script identification using only character-level data in training stage. The advantages of using character level data for training have been outlined in section I. Our method uses a multimodal deep network which takes both offline and online modality of the data as input in order to explore the information from both the modalities jointly for script identification task. We take handwritten data in either modality as input and the opposite modality is generated through intermodality conversion. Thereafter, we feed this offline-online modality pair to our network. Hence, along with the advantage of utilizing information from both the modalities, it can work as a single framework for both offline and online script identification simultaneously which alleviates the need for designing two separate script identification modules for individual modality. One more major contribution is that we propose a novel conditional multimodal fusion scheme to combine the information from offline and online modality which takes into account the real origin of the data being fed to our network and thus it combines adaptively. An exhaustive experiment has been done on a data set consisting of English and six Indic scripts. Our proposed framework clearly outperforms different frameworks based on traditional classifiers along with handcrafted features and deep learning based methods with a clear margin. Extensive experiments show that using only character level training data can achieve state-of-art performance similar to that obtained with traditional training using word level data in our framework.Comment: Accepted in Information Fusion, Elsevie

    Hierarchical Spatial-aware Siamese Network for Thermal Infrared Object Tracking

    Full text link
    Most thermal infrared (TIR) tracking methods are discriminative, treating the tracking problem as a classification task. However, the objective of the classifier (label prediction) is not coupled to the objective of the tracker (location estimation). The classification task focuses on the between-class difference of the arbitrary objects, while the tracking task mainly deals with the within-class difference of the same objects. In this paper, we cast the TIR tracking problem as a similarity verification task, which is coupled well to the objective of the tracking task. We propose a TIR tracker via a Hierarchical Spatial-aware Siamese Convolutional Neural Network (CNN), named HSSNet. To obtain both spatial and semantic features of the TIR object, we design a Siamese CNN that coalesces the multiple hierarchical convolutional layers. Then, we propose a spatial-aware network to enhance the discriminative ability of the coalesced hierarchical feature. Subsequently, we train this network end to end on a large visible video detection dataset to learn the similarity between paired objects before we transfer the network into the TIR domain. Next, this pre-trained Siamese network is used to evaluate the similarity between the target template and target candidates. Finally, we locate the candidate that is most similar to the tracked target. Extensive experimental results on the benchmarks VOT-TIR 2015 and VOT-TIR 2016 show that our proposed method achieves favourable performance compared to the state-of-the-art methods.Comment: 20 pages, 7 figure

    RiFCN: Recurrent Network in Fully Convolutional Network for Semantic Segmentation of High Resolution Remote Sensing Images

    Full text link
    Semantic segmentation in high resolution remote sensing images is a fundamental and challenging task. Convolutional neural networks (CNNs), such as fully convolutional network (FCN) and SegNet, have shown outstanding performance in many segmentation tasks. One key pillar of these successes is mining useful information from features in convolutional layers for producing high resolution segmentation maps. For example, FCN nonlinearly combines high-level features extracted from last convolutional layers; whereas SegNet utilizes a deconvolutional network which takes as input only coarse, high-level feature maps of the last convolutional layer. However, how to better fuse multi-level convolutional feature maps for semantic segmentation of remote sensing images is underexplored. In this work, we propose a novel bidirectional network called recurrent network in fully convolutional network (RiFCN), which is end-to-end trainable. It has a forward stream and a backward stream. The former is a classification CNN architecture for feature extraction, which takes an input image and produces multi-level convolutional feature maps from shallow to deep; while in the later, to achieve accurate boundary inference and semantic segmentation, boundary-aware high resolution feature maps in shallower layers and high-level but low-resolution features are recursively embedded into the learning framework (from deep to shallow) to generate a fused feature representation that draws a holistic picture of not only high-level semantic information but also low-level fine-grained details. Experimental results on two widely-used high resolution remote sensing data sets for semantic segmentation tasks, ISPRS Potsdam and Inria Aerial Image Labeling Data Set, demonstrate competitive performance obtained by the proposed methodology compared to other studied approaches

    Learning Semantic Correspondences in Technical Documentation

    Full text link
    We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation. We focus specifically on the problem of learning translational correspondences between text descriptions and grounded representations in the target documentation, such as formal representation of functions or code templates. Our approach exploits the parallel nature of such documentation, or the tight coupling between high-level text and the low-level representations we aim to learn. Data is collected by mining technical documents for such parallel text-representation pairs, which we use to train a simple semantic parsing model. We report new baseline results on sixteen novel datasets, including the standard library documentation for nine popular programming languages across seven natural languages, and a small collection of Unix utility manuals.Comment: accepted to ACL-201
    • …
    corecore