4,757 research outputs found

    Large-Scale Mapping of Human Activity using Geo-Tagged Videos

    Full text link
    This paper is the first work to perform spatio-temporal mapping of human activity using the visual content of geo-tagged videos. We utilize a recent deep-learning based video analysis framework, termed hidden two-stream networks, to recognize a range of activities in YouTube videos. This framework is efficient and can run in real time or faster which is important for recognizing events as they occur in streaming video or for reducing latency in analyzing already captured video. This is, in turn, important for using video in smart-city applications. We perform a series of experiments to show our approach is able to accurately map activities both spatially and temporally. We also demonstrate the advantages of using the visual content over the tags/titles.Comment: Accepted at ACM SIGSPATIAL 201

    FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras

    Full text link
    In this paper, we develop deep spatio-temporal neural networks to sequentially count vehicles from low quality videos captured by city cameras (citycams). Citycam videos have low resolution, low frame rate, high occlusion and large perspective, making most existing methods lose their efficacy. To overcome limitations of existing methods and incorporate the temporal information of traffic video, we design a novel FCN-rLSTM network to jointly estimate vehicle density and vehicle count by connecting fully convolutional neural networks (FCN) with long short term memory networks (LSTM) in a residual learning fashion. Such design leverages the strengths of FCN for pixel-level prediction and the strengths of LSTM for learning complex temporal dynamics. The residual learning connection reformulates the vehicle count regression as learning residual functions with reference to the sum of densities in each frame, which significantly accelerates the training of networks. To preserve feature map resolution, we propose a Hyper-Atrous combination to integrate atrous convolution in FCN and combine feature maps of different convolution layers. FCN-rLSTM enables refined feature representation and a novel end-to-end trainable mapping from pixels to vehicle count. We extensively evaluated the proposed method on different counting tasks with three datasets, with experimental results demonstrating their effectiveness and robustness. In particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21 on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process is accelerated by 5 times on average.Comment: Accepted by International Conference on Computer Vision (ICCV), 201

    A Mobile Robot Localization using External Surveillance Cameras at Indoor

    Get PDF
    AbstractLocalization is a technique that is needed for the service robot to drive at indoors, and it has been studied in various ways. Most localization techniques let the robot measure environmental information to gain location information, but those require high costs as it use many equipment, and also complicate the robot development. But if an external device could calculate the location of the robot and transmit it to the robot, it will reduce the extra cost for the internal equipment needed to recognize the location, and it will also simplify the robot development. Therefore this study suggests an effective way to control the robot by using the location information of the robot included in a map made by visual information from the surveillance cameras installed at indoors. The object in a single image is difficult to tell its size because of the shadow components and occlusion. Therefore, combination of shadow removal technique using HSV image from indoors and images from different perspective using homography to create two- dimensional map with accurate object information is suggested. In the experiment, the effectiveness of the suggested method is shown by analyzing the movement result of the robot which applied the location information from the two-dimensional map that is based on the multi cameras, which its accuracy is measured in advance
    • …
    corecore