295 research outputs found

    Linking factual and procedural knowledge in solving science problems: A case study in a thermodynamics course

    Get PDF
    Well-specified problems of the type presented boxed in the introduction to this article are extremely common in science courses. Unfortunately, this does not mean that students find them easy to solve, even when a teacher provides model answers to problems which differ only marginally (in the teacher's eyes) from those put before the students. The central difficulty with such courses is that they do not embody instructional principles that reflect students' need for “direction” in problem solving. In this article, we describe how the necessary heuristics and strategic knowledge were built into the remake of a conventional thermodynamics course. In contrast to mainstream American work on learning problem solving we chose to direct our curriculum reconstruction using the Gal'perin theory of stage-by-stage formation of mental actions and Landa's description of the “through” systematization of knowledge. As indicated by both, we first developed an integrated system of instructional objectives: a programme of actions and methods (PAM) to solve problems in thermodynamics. Then the plan of instruction was designed. This plan indicates which instructional procedures and materials should be used to realize the instructional functions, derived from the learning theory. The evaluation design contained two control and three experimental courses. In discussing our main findings, we consider the generalizability of the procedures we followed in constructing the PAM and the instructional plan

    Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions

    Get PDF
    We aim for zero-shot localization and classification of human actions in video. Where traditional approaches rely on global attribute or object classification scores for their zero-shot knowledge transfer, our main contribution is a spatial-aware object embedding. To arrive at spatial awareness, we build our embedding on top of freely available actor and object detectors. Relevance of objects is determined in a word embedding space and further enforced with estimated spatial preferences. Besides local object awareness, we also embed global object awareness into our embedding to maximize actor and object interaction. Finally, we exploit the object positions and sizes in the spatial-aware embedding to demonstrate a new spatio-temporal action retrieval scenario with composite queries. Action localization and classification experiments on four contemporary action video datasets support our proposal. Apart from state-of-the-art results in the zero-shot localization and classification settings, our spatial-aware embedding is even competitive with recent supervised action localization alternatives.Comment: ICC

    Action Search: Spotting Actions in Videos and Its Application to Temporal Action Localization

    Full text link
    State-of-the-art temporal action detectors inefficiently search the entire video for specific actions. Despite the encouraging progress these methods achieve, it is crucial to design automated approaches that only explore parts of the video which are the most relevant to the actions being searched for. To address this need, we propose the new problem of action spotting in video, which we define as finding a specific action in a video while observing a small portion of that video. Inspired by the observation that humans are extremely efficient and accurate in spotting and finding action instances in video, we propose Action Search, a novel Recurrent Neural Network approach that mimics the way humans spot actions. Moreover, to address the absence of data recording the behavior of human annotators, we put forward the Human Searches dataset, which compiles the search sequences employed by human annotators spotting actions in the AVA and THUMOS14 datasets. We consider temporal action localization as an application of the action spotting problem. Experiments on the THUMOS14 dataset reveal that our model is not only able to explore the video efficiently (observing on average 17.3% of the video) but it also accurately finds human activities with 30.8% mAP.Comment: Accepted to ECCV 201

    Counting with Focus for Free

    Get PDF
    This paper aims to count arbitrary objects in images. The leading counting approaches start from point annotations per object from which they construct density maps. Then, their training objective transforms input images to density maps through deep convolutional networks. We posit that the point annotations serve more supervision purposes than just constructing density maps. We introduce ways to repurpose the points for free. First, we propose supervised focus from segmentation, where points are converted into binary maps. The binary maps are combined with a network branch and accompanying loss function to focus on areas of interest. Second, we propose supervised focus from global density, where the ratio of point annotations to image pixels is used in another branch to regularize the overall density estimation. To assist both the density estimation and the focus from segmentation, we also introduce an improved kernel size estimator for the point annotations. Experiments on six datasets show that all our contributions reduce the counting error, regardless of the base network, resulting in state-of-the-art accuracy using only a single network. Finally, we are the first to count on WIDER FACE, allowing us to show the benefits of our approach in handling varying object scales and crowding levels. Code is available at https://github.com/shizenglin/Counting-with-Focus-for-FreeComment: ICCV, 201

    Localizing Actions from Video Labels and Pseudo-Annotations

    Get PDF
    The goal of this paper is to determine the spatio-temporal location of actions in video. Where training from hard to obtain box annotations is the norm, we propose an intuitive and effective algorithm that localizes actions from their class label only. We are inspired by recent work showing that unsupervised action proposals selected with human point-supervision perform as well as using expensive box annotations. Rather than asking users to provide point supervision, we propose fully automatic visual cues that replace manual point annotations. We call the cues pseudo-annotations, introduce five of them, and propose a correlation metric for automatically selecting and combining them. Thorough evaluation on challenging action localization datasets shows that we reach results comparable to results with full box supervision. We also show that pseudo-annotations can be leveraged during testing to improve weakly- and strongly-supervised localizers.Comment: BMV
    • …
    corecore