26,707 research outputs found

    Refining Vision Videos

    Full text link
    [Context and motivation] Complex software-based systems involve several stakeholders, their activities and interactions with the system. Vision videos are used during the early phases of a project to complement textual representations. They visualize previously abstract visions of the product and its use. By creating, elaborating, and discussing vision videos, stakeholders and developers gain an improved shared understanding of how those abstract visions could translate into concrete scenarios and requirements to which individuals can relate. [Question/problem] In this paper, we investigate two aspects of refining vision videos: (1) Refining the vision by providing alternative answers to previously open issues about the system to be built. (2) A refined understanding of the camera perspective in vision videos. The impact of using a subjective (or "ego") perspective is compared to the usual third-person perspective. [Methodology] We use shopping in rural areas as a real-world application domain for refining vision videos. Both aspects of refining vision videos were investigated in an experiment with 20 participants. [Contribution] Subjects made a significant number of additional contributions when they had received not only video or text but also both - even with very short text and short video clips. Subjective video elements were rated as positive. However, there was no significant preference for either subjective or non-subjective videos in general.Comment: 15 pages, 25th International Working Conference on Requirements Engineering: Foundation for Software Quality 201

    MoSculp: Interactive Visualization of Shape and Time

    Full text link
    We present a system that allows users to visualize complex human motion via 3D motion sculptures---a representation that conveys the 3D structure swept by a human body as it moves through space. Given an input video, our system computes the motion sculptures and provides a user interface for rendering it in different styles, including the options to insert the sculpture back into the original video, render it in a synthetic scene or physically print it. To provide this end-to-end workflow, we introduce an algorithm that estimates that human's 3D geometry over time from a set of 2D images and develop a 3D-aware image-based rendering approach that embeds the sculpture back into the scene. By automating the process, our system takes motion sculpture creation out of the realm of professional artists, and makes it applicable to a wide range of existing video material. By providing viewers with 3D information, motion sculptures reveal space-time motion information that is difficult to perceive with the naked eye, and allow viewers to interpret how different parts of the object interact over time. We validate the effectiveness of this approach with user studies, finding that our motion sculpture visualizations are significantly more informative about motion than existing stroboscopic and space-time visualization methods.Comment: UIST 2018. Project page: http://mosculp.csail.mit.edu

    CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

    Full text link
    Temporal action localization is an important yet challenging problem. Given a long, untrimmed video consisting of multiple action instances and complex background contents, we need not only to recognize their action categories, but also to localize the start time and end time of each instance. Many state-of-the-art systems use segment-level classifiers to select and rank proposal segments of pre-determined boundaries. However, a desirable model should move beyond segment-level and make dense predictions at a fine granularity in time to determine precise temporal boundaries. To this end, we design a novel Convolutional-De-Convolutional (CDC) network that places CDC filters on top of 3D ConvNets, which have been shown to be effective for abstracting action semantics but reduce the temporal length of the input data. The proposed CDC filter performs the required temporal upsampling and spatial downsampling operations simultaneously to predict actions at the frame-level granularity. It is unique in jointly modeling action semantics in space-time and fine-grained temporal dynamics. We train the CDC network in an end-to-end manner efficiently. Our model not only achieves superior performance in detecting actions in every frame, but also significantly boosts the precision of localizing temporal boundaries. Finally, the CDC network demonstrates a very high efficiency with the ability to process 500 frames per second on a single GPU server. We will update the camera-ready version and publish the source codes online soon.Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 201
    • …
    corecore