5,322 research outputs found

    Simulating the Future of Concept-Based Video Retrieval under Improved Detector Performance

    Get PDF
    In this paper we address the following important questions for concept-based video retrieval: (1) What is the impact of detector performance on the performance of concept-based retrieval engines, and (2) will these engines be applicable to real-life search tasks if detector performance improves in the future? We use Monte Carlo simulations to answer these questions. To generate the simulation input, we propose to use a probabilistic model of two Gaussians for the confidence scores that concept detectors emit. Modifying the model's parameters affects the detector performance and the search performance. We study the relation between these two performances on two video collections. For detectors with similar discriminative power and a concept vocabulary of around 100 concepts, the simulation reveals that in order to achieve a search performance of 0.20 mean average precision (MAP) -- which is considered sufficient performance for real-life applications -- one needs detectors with at least 0.60 MAP. We also find that, given our simulation model and low detector performance, MAP is not always a good evaluation measure for concept detectors since it is not strongly correlated with the search performance

    The uncertain representation ranking framework for concept-based video retrieval

    Get PDF
    Concept based video retrieval often relies on imperfect and uncertain concept detectors. We propose a general ranking framework to define effective and robust ranking functions, through explicitly addressing detector uncertainty. It can cope with multiple concept-based representations per video segment and it allows the re-use of effective text retrieval functions which are defined on similar representations. The final ranking status value is a weighted combination of two components: the expected score of the possible scores, which represents the risk-neutral choice, and the scores’ standard deviation, which represents the risk or opportunity that the score for the actual representation is higher. The framework consistently improves the search performance in the shot retrieval task and the segment retrieval task over several baselines in five TRECVid collections and two collections which use simulated detectors of varying performance

    What are the limits to time series based recognition of semantic concepts?

    Get PDF
    Most concept recognition in visual multimedia is based on relatively simple concepts, things which are present in the image or video. These usually correspond to objects which can be identified in images or individual frames. Yet there is also a need to recognise semantic con- cepts which have a temporal aspect corresponding to activities or com- plex events. These require some form of time series for recognition and also require some individual concepts to be detected so as to utilise their time-varying features, such as co-occurrence and re-occurrence patterns. While results are reported in the literature of using concept detections which are relatively specific and static, there are research questions which remain unanswered. What concept detection accuracies are satisfactory for time series recognition? Can recognition methods perform equally well across various concept detection performances? What affecting factors need to be taken into account when building concept-based high-level event/activity recognitions? In this paper, we conducted experiments to investigate these questions. Results show that though improving concept detection accuracies can enhance the recognition of time series based concepts, they do not need to be very accurate in order to characterize the dynamic evolution of time series if appropriate methods are used. Experimental results also point out the importance of concept selec- tion for time series recognition, which is usually ignored in the current literature

    The THUMOS Challenge on Action Recognition for Videos "in the Wild"

    Get PDF
    Automatically recognizing and localizing wide ranges of human actions has crucial importance for video understanding. Towards this goal, the THUMOS challenge was introduced in 2013 to serve as a benchmark for action recognition. Until then, video action recognition, including THUMOS challenge, had focused primarily on the classification of pre-segmented (i.e., trimmed) videos, which is an artificial task. In THUMOS 2014, we elevated action recognition to a more practical level by introducing temporally untrimmed videos. These also include `background videos' which share similar scenes and backgrounds as action videos, but are devoid of the specific actions. The three editions of the challenge organized in 2013--2015 have made THUMOS a common benchmark for action classification and detection and the annual challenge is widely attended by teams from around the world. In this paper we describe the THUMOS benchmark in detail and give an overview of data collection and annotation procedures. We present the evaluation protocols used to quantify results in the two THUMOS tasks of action classification and temporal detection. We also present results of submissions to the THUMOS 2015 challenge and review the participating approaches. Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos. We conclude by proposing several directions and improvements for future THUMOS challenges.Comment: Preprint submitted to Computer Vision and Image Understandin

    Towards training-free refinement for semantic indexing of visual media

    Get PDF
    Indexing of visual media based on content analysis has now moved beyond using individual concept detectors and there is now a fo- cus on combining concepts or post-processing the outputs of individual concept detection. Due to the limitations and availability of training cor- pora which are usually sparsely and imprecisely labeled, training-based refinement methods for semantic indexing of visual media suffer in cor- rectly capturing relationships between concepts, including co-occurrence and ontological relationships. In contrast to training-dependent methods which dominate this field, this paper presents a training-free refinement (TFR) algorithm for enhancing semantic indexing of visual media based purely on concept detection results, making the refinement of initial con- cept detections based on semantic enhancement, practical and flexible. This is achieved using global and temporal neighbourhood information inferred from the original concept detections in terms of weighted non- negative matrix factorization and neighbourhood-based graph propaga- tion, respectively. Any available ontological concept relationships can also be integrated into this model as an additional source of external a priori knowledge. Experiments on two datasets demonstrate the efficacy of the proposed TFR solution

    Improving the classification of quantified self activities and behaviour using a Fisher kernel

    Get PDF
    Visual recording of everyday human activities and behaviour over the long term is now feasible and with the widespread use of wearable devices embedded with cameras this offers the potential to gain real insights into wearers’ activities and behaviour. To date we have concentrated on automatically detecting semantic concepts from within visual lifelogs yet identifying human activities from such lifelogged images or videos is still a major challenge if we are to use lifelogs to maximum benefit. In this paper, we propose an activity classification method from visual lifelogs based on Fisher kernels, which extract discriminative embeddings from Hidden Markov Models (HMMs) of occurrences of semantic concepts. By using the gradients as features, the resulting classifiers can better distinguish different activities and from that we can make inferences about human behaviour. Experiments show the effectiveness of this method in improving classification accuracy, especially when the semantic concepts are initially detected with low degrees of accuracy

    Free Space Optical Link Utilizing a Modulated Retro-Reflector Intended for Planetary Duplex Communication Links Between an Orbiter and Surface Unit

    Get PDF
    Presented are simulation and experimental results that provide duplex optical-free space communication links with minimal power and pointing requirements by using a modulated retro-reflector (MRR) for planetary communications. The design is the MRR resides on the surface of a planet or moon, where energy is scarce, while the source of the communication laser resides on an orbiter to achieve satellite-to-ground communications. Also, a simulated scenario using the Mars Reconnaissance Orbiter (MRO) is provided for real world potential results. The information sent through this communication path can range from raw scientific data to multimedia files such as videos and pictures. Bidirectional communications is established with the MRR by using a nested pulse position modulation (PPM) structure. This modulation scheme is then evaluated for its validity in a proof-of-concept experiment. Initial results indicate a promising return-link performance of at least 300 kbps in the nested arrangement

    Using visual lifelogs to automatically characterise everyday activities

    Get PDF
    Visual lifelogging is the term used to describe recording our everyday lives using wearable cameras, for applications which are personal to us and do not involve sharing our recorded data. Current applications of visual lifelogging are built around remembrance or searching for specific events from the past. The purpose of the work reported here is to extend this to allow us to characterise and measure the occurrence of everyday activities of the wearer and in so doing to gain insights into the wearer's everyday behaviour. The methods we use are to capture everyday activities using a wearable camera called SenseCam, and to use an algorithm we have developed which indexes lifelog images by the occurrence of basic semantic concepts. We then use data reduction techniques to automatically generate a profile of the wearer's everyday behaviour and activities. Our algorithm has been evaluated on a large set of concepts investigated from 13 users in a user experiment, and for a group of 16 popular everyday activities we achieve an average F-score of 0.90. Our conclusions are that the the technique we have presented for unobtrusively and ambiently characterising everyday behaviour and activities across individuals is of sufficient accuracy to be usable in a range of applications

    Technology for the Future: In-Space Technology Experiments Program, part 2

    Get PDF
    The purpose of the Office of Aeronautics and Space Technology (OAST) In-Space Technology Experiments Program In-STEP 1988 Workshop was to identify and prioritize technologies that are critical for future national space programs and require validation in the space environment, and review current NASA (In-Reach) and industry/ university (Out-Reach) experiments. A prioritized list of the critical technology needs was developed for the following eight disciplines: structures; environmental effects; power systems and thermal management; fluid management and propulsion systems; automation and robotics; sensors and information systems; in-space systems; and humans in space. This is part two of two parts and contains the critical technology presentations for the eight theme elements and a summary listing of critical space technology needs for each theme
    corecore