Search CORE

453 research outputs found

Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition

Author: Chen Xin
Crandall David J
Sharghi Aidean
Xu Mingze
Publication venue
Publication date: 11/01/2018
Field of study

A major emerging challenge is how to protect people's privacy as cameras and computer vision are increasingly integrated into our daily lives, including in smart devices inside homes. A potential solution is to capture and record just the minimum amount of information needed to perform a task of interest. In this paper, we propose a fully-coupled two-stream spatiotemporal architecture for reliable human action recognition on extremely low resolution (e.g., 12x16 pixel) videos. We provide an efficient method to extract spatial and temporal features and to aggregate them into a robust feature representation for an entire action video sequence. We also consider how to incorporate high resolution videos during training in order to build better low resolution action recognition models. We evaluate on two publicly-available datasets, showing significant improvements over the state-of-the-art.Comment: 9 pagers, 5 figures, published in WACV 201

arXiv.org e-Print Archive

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Temporal Recurrent Networks for Online Action Detection

Author: Chen Yi-Ting
Crandall David J.
Davis Larry S.
Gao Mingfei
Xu Mingze
Publication venue
Publication date: 17/11/2018
Field of study

Most work on temporal action detection is formulated as an offline problem, in which the start and end times of actions are determined after the entire video is fully observed. However, important real-time applications including surveillance and driver assistance systems require identifying actions as soon as each video frame arrives, based only on current and historical observations. In this paper, we propose a novel framework, Temporal Recurrent Network (TRN), to model greater temporal context of a video frame by simultaneously performing online action detection and anticipation of the immediate future. At each moment in time, our approach makes use of both accumulated historical evidence and predicted future information to better recognize the action that is currently occurring, and integrates both of these into a unified end-to-end architecture. We evaluate our approach on two popular online action detection datasets, HDD and TVSeries, as well as another widely used dataset, THUMOS'14. The results show that TRN significantly outperforms the state-of-the-art

arXiv.org e-Print Archive

Crossref

Scipedia

Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems

Author: Atkins Ella M.
Choi Chiho
Crandall David J.
Dariush Behzad
Xu Mingze
Yao Yu
Publication venue
Publication date: 18/09/2018
Field of study

Predicting the future location of vehicles is essential for safety-critical applications such as advanced driver assistance systems (ADAS) and autonomous driving. This paper introduces a novel approach to simultaneously predict both the location and scale of target vehicles in the first-person (egocentric) view of an ego-vehicle. We present a multi-stream recurrent neural network (RNN) encoder-decoder model that separately captures both object location and scale and pixel-level observations for future vehicle localization. We show that incorporating dense optical flow improves prediction results significantly since it captures information about motion as well as appearance change. We also find that explicitly modeling future motion of the ego-vehicle improves the prediction accuracy, which could be especially beneficial in intelligent and automated vehicles that have motion planning capability. To evaluate the performance of our approach, we present a new dataset of first-person videos collected from a variety of scenarios at road intersections, which are particularly challenging moments for prediction because vehicle trajectories are diverse and dynamic.Comment: To appear on ICRA 201

arXiv.org e-Print Archive

Crossref

Scipedia

Recurrent violent injury: magnitude, risk factors, and opportunities for intervention from a statewide analysis.

Author: Crandall Marie L.
Delgado M. Kit
Ebler David J.
Kaufman Elinore
Rising MD, MS, Kristin L.
Wiebe Douglas J.
Publication venue: Jefferson Digital Commons
Publication date: 01/09/2016
Field of study

INTRODUCTION: Although preventing recurrent violent injury is an important component of a public health approach to interpersonal violence and a common focus of violence intervention programs, the true incidence of recurrent violent injury is unknown. Prior studies have reported recurrence rates from 0.8% to 44%, and risk factors for recurrence are not well established. METHODS: We used a statewide, all-payer database to perform a retrospective cohort study of emergency department visits for injury due to interpersonal violence in Florida, following up patients injured in 2010 for recurrence through 2012. We assessed risk factors for recurrence with multivariable logistic regression and estimated time to recurrence with the Kaplan-Meier method. We tabulated hospital charges and costs for index and recurrent visits. RESULTS: Of 53 908 patients presenting for violent injury in 2010, 11.1% had a recurrent violent injury during the study period. Trauma centers treated 31.8%, including 55.9% of severe injuries. Among recurrers, 58.9% went to a different hospital for their second injury. Low income, homelessness, Medicaid or uninsurance, and black race were associated with increased odds of recurrence. Patients with visits for mental and behavioral health and unintentional injury also had increased odds of recurrence. Index injuries accounted for

105 million in costs, and recurrent injuries accounted for another

25.3 million. CONCLUSIONS: Recurrent violent injury is a common and costly phenomenon, and effective violence prevention programs are needed. Prevention must include the nontrauma centers where many patients seek care

Crossref

PubMed Central

ScholarlyCommons@Penn

Jefferson Digital Commons

Predicting Geo-informative Attributes in Large-Scale Image Collections Using Convolutional Neural Networks

Author: David J. Crandall
Haipeng Zhang
Stefan Lee
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Geographic location is a powerful property for or-ganizing large-scale photo collections, but only a small fraction of online photos are geo-tagged. Most work in automatically estimating geo-tags from image content is based on comparison against models of buildings or land-marks, or on matching to large reference collections of geo-tagged images. These approaches work well for frequently-photographed places like major cities and tourist destina-tions, but fail for photos taken in sparsely photographed places where few reference photos exist. Here we consider how to recognize general geo-informative attributes of a photo, e.g. the elevation gradient, population density, de-mographics, etc. of where it was taken, instead of trying to estimate a precise geo-tag. We learn models for these attributes using a large (noisy) set of geo-tagged images from Flickr by training deep convolutional neural networks (CNNs). We evaluate on over a dozen attributes, showing that while automatically recognizing some attributes is very difficult, others can be automatically estimated with about the same accuracy as a human. 1

CiteSeerX

Crossref

Identifying First-person Camera Wearers in Third-person Videos

Author: Crandall David J.
Fan Chenyou
Lee Jangwon
Lee Yong Jae
Ryoo Michael S.
Singh Krishna Kumar
Xu Mingze
Publication venue
Publication date: 20/04/2017
Field of study

We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in environments in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene. To do this, we need to establish person-level correspondences across first- and third-person videos, which is challenging because the camera wearer is not visible from his/her own egocentric video, preventing the use of direct feature matching. In this paper, we propose a new semi-Siamese Convolutional Neural Network architecture to address this novel challenge. We formulate the problem as learning a joint embedding space for first- and third-person videos that considers both spatial- and motion-domain cues. A new triplet loss function is designed to minimize the distance between correct first- and third-person matches while maximizing the distance between incorrect ones. This end-to-end approach performs significantly better than several baselines, in part by learning the first- and third-person features optimized for matching jointly with the distance measure itself

arXiv.org e-Print Archive

Crossref

ECCV (17) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification

Author: Crandall David J.
Luo Jiebo
Shao Ling
Shen Jianbing
Ye Mang
Publication venue: ZU Scholars
Publication date: 18/07/2020
Field of study

ZU Scholars (Zayed University)