13 research outputs found

    3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks

    Full text link
    Human activity understanding with 3D/depth sensors has received increasing attention in multimedia processing and interactions. This work targets on developing a novel deep model for automatic activity recognition from RGB-D videos. We represent each human activity as an ensemble of cubic-like video segments, and learn to discover the temporal structures for a category of activities, i.e. how the activities to be decomposed in terms of classification. Our model can be regarded as a structured deep architecture, as it extends the convolutional neural networks (CNNs) by incorporating structure alternatives. Specifically, we build the network consisting of 3D convolutions and max-pooling operators over the video segments, and introduce the latent variables in each convolutional layer manipulating the activation of neurons. Our model thus advances existing approaches in two aspects: (i) it acts directly on the raw inputs (grayscale-depth data) to conduct recognition instead of relying on hand-crafted features, and (ii) the model structure can be dynamically adjusted accounting for the temporal variations of human activities, i.e. the network configuration is allowed to be partially activated during inference. For model training, we propose an EM-type optimization method that iteratively (i) discovers the latent structure by determining the decomposed actions for each training example, and (ii) learns the network parameters by using the back-propagation algorithm. Our approach is validated in challenging scenarios, and outperforms state-of-the-art methods. A large human activity database of RGB-D videos is presented in addition.Comment: This manuscript has 10 pages with 9 figures, and a preliminary version was published in ACM MM'14 conferenc

    Background Filtering for Improving of Object Detection in Images

    Full text link

    Characterizing Compressibility of Disjoint Subgraphs with NLC Grammars

    Full text link
    We consider compression of a given set S of isomorphic and disjoint subgraphs of a graph G using node label controlled (NLC) graph grammars. Given S and G, we characterize whether or not there exists a NLC graph grammar consisting of exactly one rule such that (1) each of the subgraphs S in G are compressed (i.e., replaced by a nonterminal) in the (unique) initial graph I , and (2) the set of generated terminal graphs is the singleton {G}.acceptance rate: 39%status: publishe

    A new paradigm based on agents applied to free-hand sketch recognition

    Get PDF
    Important advances in natural calligraphic interfaces for CAD (Computer Aided Design) applications are being achieved, enabling the development of CAS (Computer Aided Sketching) devices that allow facing up to the conceptual design phase of a product. Recognizers play an important role in this field, allowing the interpretation of the user’s intention, but they still present some important lacks. This paper proposes a new recognition paradigm using an agent-based architecture that does not depend on the drawing sequence and takes context information into account to help decisions. Another improvement is the absence of operation modes, that is, no button is needed to distinguish geometry from symbols or gestures, and also “interspersing” and “overtracing” are accomplishedThe Spanish Ministry of Science and Education and the FEDER Funds, through the CUESKETCH project (Ref. DPI2007-66755-C02-01), partially supported this work.Fernández Pacheco, D.; Albert Gil, FE.; Aleixos Borrás, MN.; Conesa Pastor, J. (2012). A new paradigm based on agents applied to free-hand sketch recognition. Expert Systems with Applications. 39(8):7181-7195. https://doi.org/10.1016/j.eswa.2012.01.063S7181719539

    Agent-based framework for person re-identification

    Get PDF
    In computer based human object re-identification, a detected human is recognised to a level sufficient to re-identify a tracked person in either a different camera capturing the same individual, often at a different angle, or the same camera at a different time and/or the person approaching the camera at a different angle. Instead of relying on face recognition technology such systems study the clothing of the individuals being monitored and/or objects being carried to establish correspondence and hence re-identify the human object. Unfortunately present human-object re-identification systems consider the entire human object as one connected region in making the decisions about similarity of two objects being matched. This assumption has a major drawback in that when a person is partially occluded, a part of the occluding foreground will be picked up and used in matching. Our research revealed that when a human observer carries out a manual human-object re-identification task, the attention is often taken over by some parts of the human figure/body, more than the others, e.g. face, brightly colour shirt, presence of texture patterns in clothing etc., and occluding parts are ignored. In this thesis, a novel multi-agent based framework is proposed for the design of a human object re-identification system. Initially a HOG based feature extraction is used in a SVM based classification of a human object as a human of a full-body or of half body nature. Subsequently the relative visual significance of the top and the bottom parts of the human, in re-identification is quantified by the analysis of Gray Level Co-occurrence based texture features and colour histograms obtained in the HSV colour space. Accordingly different weights are assigned to the top and bottom of the human body using a novel probabilistic approach. The weights are then used to modify the Hybrid Spatiogram and Covariance Descriptor (HSCD) feature based re-identification algorithm adopted. A significant novelty of the human object re-identification systems proposed in this thesis is the agent based design procedure adopted that separates the use of computer vision algorithms for feature extraction, comparison etc., from the decision making process of re-identification. Multiple agents are assigned to execute different algorithmic tasks and the agents communicate to make the required logical decisions. Detailed experimental results are provided to prove that the proposed multi agent based framework for human object re-identification performs significantly better than the state of-the-art algorithms. Further it is shown that the design flexibilities and scalabilities of the proposed system allows it to be effectively utilised in more complex computer vision based video analytic/forensic tasks often conducted within distributed, multi-camera systems
    corecore