24 research outputs found

    Online, Supervised and Unsupervised Action Localization in Videos

    Get PDF
    Action recognition classifies a given video among a set of action labels, whereas action localization determines the location of an action in addition to its class. The overall aim of this dissertation is action localization. Many of the existing action localization approaches exhaustively search (spatially and temporally) for an action in a video. However, as the search space increases with high resolution and longer duration videos, it becomes impractical to use such sliding window techniques. The first part of this dissertation presents an efficient approach for localizing actions by learning contextual relations between different video regions in training. In testing, we use the context information to estimate the probability of each supervoxel belonging to the foreground action and use Conditional Random Field (CRF) to localize actions. In the above method and typical approaches to this problem, localization is performed in an offline manner where all the video frames are processed together. This prevents timely localization and prediction of actions/interactions - an important consideration for many tasks including surveillance and human-machine interaction. Therefore, in the second part of this dissertation we propose an online approach to the challenging problem of localization and prediction of actions/interactions in videos. In this approach, we use human poses and superpixels in each frame to train discriminative appearance models and perform online prediction of actions/interactions with Structural SVM. Above two approaches rely on human supervision in the form of assigning action class labels to videos and annotating actor bounding boxes in each frame of training videos. Therefore, in the third part of this dissertation we address the problem of unsupervised action localization. Given unlabeled videos without annotations, this approach aims at: 1) Discovering action classes using a discriminative clustering approach, and 2) Localizing actions using a variant of Knapsack problem

    Effects of a high-dose 24-h infusion of tranexamic acid on death and thromboembolic events in patients with acute gastrointestinal bleeding (HALT-IT): an international randomised, double-blind, placebo-controlled trial

    Get PDF
    Background: Tranexamic acid reduces surgical bleeding and reduces death due to bleeding in patients with trauma. Meta-analyses of small trials show that tranexamic acid might decrease deaths from gastrointestinal bleeding. We aimed to assess the effects of tranexamic acid in patients with gastrointestinal bleeding. Methods: We did an international, multicentre, randomised, placebo-controlled trial in 164 hospitals in 15 countries. Patients were enrolled if the responsible clinician was uncertain whether to use tranexamic acid, were aged above the minimum age considered an adult in their country (either aged 16 years and older or aged 18 years and older), and had significant (defined as at risk of bleeding to death) upper or lower gastrointestinal bleeding. Patients were randomly assigned by selection of a numbered treatment pack from a box containing eight packs that were identical apart from the pack number. Patients received either a loading dose of 1 g tranexamic acid, which was added to 100 mL infusion bag of 0·9% sodium chloride and infused by slow intravenous injection over 10 min, followed by a maintenance dose of 3 g tranexamic acid added to 1 L of any isotonic intravenous solution and infused at 125 mg/h for 24 h, or placebo (sodium chloride 0·9%). Patients, caregivers, and those assessing outcomes were masked to allocation. The primary outcome was death due to bleeding within 5 days of randomisation; analysis excluded patients who received neither dose of the allocated treatment and those for whom outcome data on death were unavailable. This trial was registered with Current Controlled Trials, ISRCTN11225767, and ClinicalTrials.gov, NCT01658124. Findings: Between July 4, 2013, and June 21, 2019, we randomly allocated 12 009 patients to receive tranexamic acid (5994, 49·9%) or matching placebo (6015, 50·1%), of whom 11 952 (99·5%) received the first dose of the allocated treatment. Death due to bleeding within 5 days of randomisation occurred in 222 (4%) of 5956 patients in the tranexamic acid group and in 226 (4%) of 5981 patients in the placebo group (risk ratio [RR] 0·99, 95% CI 0·82–1·18). Arterial thromboembolic events (myocardial infarction or stroke) were similar in the tranexamic acid group and placebo group (42 [0·7%] of 5952 vs 46 [0·8%] of 5977; 0·92; 0·60 to 1·39). Venous thromboembolic events (deep vein thrombosis or pulmonary embolism) were higher in tranexamic acid group than in the placebo group (48 [0·8%] of 5952 vs 26 [0·4%] of 5977; RR 1·85; 95% CI 1·15 to 2·98). Interpretation: We found that tranexamic acid did not reduce death from gastrointestinal bleeding. On the basis of our results, tranexamic acid should not be used for the treatment of gastrointestinal bleeding outside the context of a randomised trial

    Unsupervised Action Discovery And Localization In Videos

    No full text
    This paper is the first to address the problem of unsupervised action localization in videos. Given unlabeled data without bounding box annotations, we propose a novel approach that: 1) Discovers action class labels and 2) Spatio-temporally localizes actions in videos. It begins by computing local video features to apply spectral clustering on a set of unlabeled training videos. For each cluster of videos, an undirected graph is constructed to extract a dominant set, which are known for high internal homogeneity and in-homogeneity between vertices outside it. Next, a discriminative clustering approach is applied, by training a classifier for each cluster, to iteratively select videos from the non-dominant set and obtain complete video action classes. Once classes are discovered, training videos within each cluster are selected to perform automatic spatio-temporal annotations, by first over-segmenting videos in each discovered class into supervoxels and constructing a directed graph to apply a variant of knapsack problem with temporal constraints. Knapsack optimization jointly collects a subset of supervoxels, by enforcing the annotated action to be spatio-temporally connected and its volume to be the size of an actor. These annotations are used to train SVM action classifiers. During testing, actions are localized using a similar Knapsack approach, where supervoxels are grouped together and SVM, learned using videos from discovered action classes, is used to recognize these actions. We evaluate our approach on UCF-Sports, Sub-JHMDB, JHMDB, THUMOS13 and UCF101 datasets. Our experiments suggest that despite using no action class labels and no bounding box annotations, we are able to get competitive results to the state-of-the-art supervised methods

    Predicting The Where And What Of Actors And Actions Through Online Action Localization

    No full text
    This paper proposes a novel approach to tackle the challenging problem of \u27online action localization\u27 which entails predicting actions and their locations as they happen in a video. Typically, action localization or recognition is performed in an offline manner where all the frames in the video are processed together and action labels are not predicted for the future. This disallows timely localization of actions - an important consideration for surveillance tasks. In our approach, given a batch of frames from the immediate past in a video, we estimate pose and oversegment the current frame into superpixels. Next, we discriminatively train an actor foreground model on the superpixels using the pose bounding boxes. A Conditional Random Field with superpixels as nodes, and edges connecting spatio-temporal neighbors is used to obtain action segments. The action confidence is predicted using dynamic programming on SVM scores obtained on short segments of the video, thereby capturing sequential information of the actions. The issue of visual drift is handled by updating the appearance model and pose refinement in an online manner. Lastly, we introduce a new measure to quantify the performance of action prediction (i.e. online action localization), which analyzes how the prediction accuracy varies as a function of observed portion of the video. Our experiments suggest that despite using only a few frames to localize actions at each time instant, we are able to predict the action and obtain competitive results to state-of-the-art offline methods

    Detecting Humans In Dense Crowds Using Locally-Consistent Scale Prior And Global Occlusion Reasoning

    No full text
    Human detection in dense crowds is an important problem, as it is a prerequisite to many other visual tasks, such as tracking, counting, action recognition or anomaly detection in behaviors exhibited by individuals in a dense crowd. This problem is challenging due to the large number of individuals, small apparent size, severe occlusions and perspective distortion. However, crowded scenes also offer contextual constraints that can be used to tackle these challenges. In this paper, we explore context for human detection in dense crowds in the form of a locally-consistent scale prior which captures the similarity in scale in local neighborhoods and its smooth variation over the image. Using the scale and confidence of detections obtained from an underlying human detector, we infer scale and confidence priors using Markov Random Field. In an iterative mechanism, the confidences of detection hypotheses are modified to reflect consistency with the inferred priors, and the priors are updated based on the new detections. The final set of detections obtained are then reasoned for occlusion using Binary Integer Programming where overlaps and relations between parts of individuals are encoded as linear constraints. Both human detection and occlusion reasoning in proposed approach are solved with local neighbor-dependent constraints, thereby respecting the inter-dependence between individuals characteristic to dense crowd analysis. In addition, we propose a mechanism to detect different combinations of body parts without requiring annotations for individual combinations. We performed experiments on a new and extremely challenging dataset of dense crowd images showing marked improvement over the underlying human detector

    Tracking When The Camera Looks Away

    No full text
    Tracking players in sports videos presents numerous challenges due to weak distinguishing features and unpredictable motion. Considerable work has been done to track players in such videos using a combination of appearance and motion modeling, mostly in continuous streams of video. However, in a broadcast sports video, having advertisements, replays and intermittent change of camera view, it becomes a challenging task to keep track of players over an entire game. In this work, we solve a novel problem of tracking over a sequence of temporally disjoint soccer videos without the use of appearance cue, using a Graph based optimization approach. Each team is represented by a graph, in which the nodes correspond to player positions and the edge weights depend on spatial inter-player distance. We use team formation to associate tracks between clips and provide an end-to-end system that is able to perform statistical and tactical analysis of the game. We also introduce a new challenging dataset of an international soccer game

    Action Localization In Videos Through Context Walk

    No full text
    This paper presents an efficient approach for localizing actions by learning contextual relations, in the form of relative locations between different video regions. We begin by over-segmenting the videos into supervoxels, which have the ability to preserve action boundaries and also reduce the complexity of the problem. Context relations are learned during training which capture displacements from all the supervoxels in a video to those belonging to foreground actions. Then, given a testing video, we select a supervoxel randomly and use the context information acquired during training to estimate the probability of each supervoxel belonging to the foreground action. The walk proceeds to a new supervoxel and the process is repeated for a few steps. This context walk generates a conditional distribution of an action over all the supervoxels. A Conditional Random Field is then used to find action proposals in the video, whose confidences are obtained using SVMs. We validated the proposed approach on several datasets and show that context in the form of relative displacements between supervoxels can be extremely useful for action localization. This also results in significantly fewer evaluations of the classifier, in sharp contrast to the alternate sliding window approaches

    Online Localization and Prediction of Actions and Interactions

    No full text
    corecore