2,362 research outputs found

    Object tracking in the presence of shaking motions

    Full text link
    Ā© 2018, The Natural Computing Applications Forum. Visual tracking can be particularly interpreted as a process of searching for targets and optimizing the searching. In this paper, we present a novel tracker framework for tracking shaking targets. We formulate the underlying geometrical relevance between a search scope and a target displacement. A uniform sampling among the search scopes is implemented by sliding windows. To alleviate any possible redundant matching, we propose a double-template structure comprising of initial and previous tracking results. The element-wise similarities between a template and its candidates are calculated by jointly using kernel functions which provide a better outlier rejection property. The STC algorithm is used to improve the tracking results by maximizing a confidence map incorporating temporal and spatial context cues about the tracked targets. For better adaptation to appearance variations, we employ a linear interpolation to update the context prior probability of the STC method. Both qualitative and quantitative evaluations are performed on all sequences that contain shaking motions and are selected from the OTB-50 challenging benchmark. The proposed approach is compared with and outperforms 12 state-of-the-art tracking methods on the selected sequences while running on MATLAB without code optimization. We have also performed further experiments on the whole OTB-50 and VOT 2015 datasets. Although the most of sequences in these two datasets do not contain motion blur that this paper is focusing on, the results of our method are still favorable compared with all of the state-of-the-art approaches

    Weakly Supervised Action Localization by Sparse Temporal Pooling Network

    Full text link
    We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video using an attention module and fuse the key segments through adaptive temporal pooling. Our loss function is comprised of two terms that minimize the video-level action classification error and enforce the sparsity of the segment selection. At inference time, we extract and score temporal proposals using temporal class activations and class-agnostic attentions to estimate the time intervals that correspond to target actions. The proposed algorithm attains state-of-the-art results on the THUMOS14 dataset and outstanding performance on ActivityNet1.3 even with its weak supervision.Comment: Accepted to CVPR 201

    Velocity-Based LOD Reduction in Virtual Reality: A Psychometric Approach

    Full text link
    Virtual Reality headsets enable users to explore the environment by performing self-induced movements. The retinal velocity produced by such motion reduces the visual system's ability to resolve fine detail. We measured the impact of self-induced head rotations on the ability to detect quality changes of a realistic 3D model in an immersive virtual reality environment. We varied the Level-of-Detail (LOD) as a function of rotational head velocity with different degrees of severity. Using a psychophysical method, we asked 17 participants to identify which of the two presented intervals contained the higher quality model under two different maximum velocity conditions. After fitting psychometric functions to data relating the percentage of correct responses to the aggressiveness of LOD manipulations, we identified the threshold severity for which participants could reliably (75\%) detect the lower LOD model. Participants accepted an approximately four-fold LOD reduction even in the low maximum velocity condition without a significant impact on perceived quality, which suggests that there is considerable potential for optimisation when users are moving (increased range of perceptual uncertainty). Moreover, LOD could be degraded significantly more in the maximum head velocity condition, suggesting these effects are indeed speed dependent

    Medical image segmentation and analysis using statistical shape modelling and inter-landmark relationships

    Get PDF
    The study of anatomical morphology is of great importance to medical imaging, with applications varying from clinical diagnosis to computer-aided surgery. To this end, automated tools are required for accurate extraction of the anatomical boundaries from the image data and detailed interpretation of morphological information. This thesis introduces a novel approach to shape-based analysis of medical images based on Inter- Landmark Descriptors (ILDs). Unlike point coordinates that describe absolute position, these shape variables represent relative configuration of landmarks in the shape. The proposed work is motivated by the inherent difficulties of methods based on landmark coordinates in challenging applications. Through explicit invariance to pose parameters and decomposition of the global shape constraints, this work permits anatomical shape analysis that is resistant to image inhomogeneities and geometrical inconsistencies. Several algorithms are presented to tackle specific image segmentation and analysis problems, including automatic initialisation, optimal feature point search, outlier handling and dynamic abnormality localisation. Detailed validation results are provided based on various cardiovascular magnetic resonance datasets, showing increased robustness and accuracy.Open acces

    Fuzzy-based Propagation of Prior Knowledge to Improve Large-Scale Image Analysis Pipelines

    Get PDF
    Many automatically analyzable scientific questions are well-posed and offer a variety of information about the expected outcome a priori. Although often being neglected, this prior knowledge can be systematically exploited to make automated analysis operations sensitive to a desired phenomenon or to evaluate extracted content with respect to this prior knowledge. For instance, the performance of processing operators can be greatly enhanced by a more focused detection strategy and the direct information about the ambiguity inherent in the extracted data. We present a new concept for the estimation and propagation of uncertainty involved in image analysis operators. This allows using simple processing operators that are suitable for analyzing large-scale 3D+t microscopy images without compromising the result quality. On the foundation of fuzzy set theory, we transform available prior knowledge into a mathematical representation and extensively use it enhance the result quality of various processing operators. All presented concepts are illustrated on a typical bioimage analysis pipeline comprised of seed point detection, segmentation, multiview fusion and tracking. Furthermore, the functionality of the proposed approach is validated on a comprehensive simulated 3D+t benchmark data set that mimics embryonic development and on large-scale light-sheet microscopy data of a zebrafish embryo. The general concept introduced in this contribution represents a new approach to efficiently exploit prior knowledge to improve the result quality of image analysis pipelines. Especially, the automated analysis of terabyte-scale microscopy data will benefit from sophisticated and efficient algorithms that enable a quantitative and fast readout. The generality of the concept, however, makes it also applicable to practically any other field with processing strategies that are arranged as linear pipelines.Comment: 39 pages, 12 figure

    Discovering user mobility and activity in smart lighting environments

    Full text link
    "Smart lighting" environments seek to improve energy efficiency, human productivity and health by combining sensors, controls, and Internet-enabled lights with emerging ā€œInternet-of-Thingsā€ technology. Interesting and potentially impactful applications involve adaptive lighting that responds to individual occupants' location, mobility and activity. In this dissertation, we focus on the recognition of user mobility and activity using sensing modalities and analytical techniques. This dissertation encompasses prior work using body-worn inertial sensors in one study, followed by smart-lighting inspired infrastructure sensors deployed with lights. The first approach employs wearable inertial sensors and body area networks that monitor human activities with a user's smart devices. Real-time algorithms are developed to (1) estimate angles of excess forward lean to prevent risk of falls, (2) identify functional activities, including postures, locomotion, and transitions, and (3) capture gait parameters. Two human activity datasets are collected from 10 healthy young adults and 297 elder subjects, respectively, for laboratory validation and real-world evaluation. Results show that these algorithms can identify all functional activities accurately with a sensitivity of 98.96% on the 10-subject dataset, and can detect walking activities and gait parameters consistently with high test-retest reliability (p-value < 0.001) on the 297-subject dataset. The second approach leverages pervasive "smart lighting" infrastructure to track human location and predict activities. A use case oriented design methodology is considered to guide the design of sensor operation parameters for localization performance metrics from a system perspective. Integrating a network of low-resolution time-of-flight sensors in ceiling fixtures, a recursive 3D location estimation formulation is established that links a physical indoor space to an analytical simulation framework. Based on indoor location information, a label-free clustering-based method is developed to learn user behaviors and activity patterns. Location datasets are collected when users are performing unconstrained and uninstructed activities in the smart lighting testbed under different layout configurations. Results show that the activity recognition performance measured in terms of CCR ranges from approximately 90% to 100% throughout a wide range of spatio-temporal resolutions on these location datasets, insensitive to the reconfiguration of environment layout and the presence of multiple users.2017-02-17T00:00:00
    • ā€¦
    corecore