Search CORE

2,749 research outputs found

Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks

Author: Lan Cuiling
Li Yanghao
Shen Li
Xie Xiaohui
Xing Junliang
Zeng Wenjun
Zhu Wentao
Publication venue
Publication date: 05/03/2016
Field of study

Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions. Considering that recurrent neural networks (RNNs) with Long Short-Term Memory (LSTM) can learn feature representations and model long-term temporal dependencies automatically, we propose an end-to-end fully connected deep LSTM network for skeleton based action recognition. Inspired by the observation that the co-occurrences of the joints intrinsically characterize human actions, we take the skeleton as the input at each time slot and introduce a novel regularization scheme to learn the co-occurrence features of skeleton joints. To train the deep LSTM network effectively, we propose a new dropout algorithm which simultaneously operates on the gates, cells, and output responses of the LSTM neurons. Experimental results on three human action recognition datasets consistently demonstrate the effectiveness of the proposed model.Comment: AAAI 2016 conferenc

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US

Author: Aiden Erez Lieberman
Chen Duyun
Deng Jia
Fei-Fei Li
Gebru Timnit
Krause Jonathan
Wang Yilun
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 02/03/2017
Field of study

The United States spends more than $1B each year on initiatives such as the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed half a decade. As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may provide a cheaper and faster alternative. Here, we present a method that determines socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to accurately estimate income, race, education, and voting patterns, with single-precinct resolution. (The average US precinct contains approximately 1000 people.) The resulting associations are surprisingly simple and powerful. For instance, if the number of sedans encountered during a 15-minute drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next Presidential election (88% chance); otherwise, it is likely to vote Republican (82%). Our results suggest that automated systems for monitoring demographic trends may effectively complement labor-intensive approaches, with the potential to detect trends with fine spatial resolution, in close to real time.Comment: 41 pages including supplementary material. Under review at PNA

arXiv.org e-Print Archive

A Neutral Network Based Vehicle Classification System for Pervasive Smart Road Security

Author: Cooley Donald
He Jing (Selena)
Park Jong Hyuk
Xiong Naixue
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 02/03/2009
Field of study

Pervasive smart computing environments make people get accustomed to convenient and secure services. The overall goal of this research is to classify vehicles along the I215 freeway in Salt Lake City, USA. This information will be used to predict future roadway needs and the expected life of a roadway. The classification of vehicles will be performed by a synthesis of multiple sets of features. All feature sets have not yet been determined; however, one such set will be the reduced wavelet transform of the image of a vehicle. In order to use such a feature, it is necessary that the image be normalized with respect to size, position, and so on. For example, a car in the right most lane in an image will appear smaller than one in the left most lane, because the right most lane is closest to the camera. Likewise, a vehicle’s size will vary depending on where in a lane its image is captured. In our case, the image capture area for each lane is approximately 100 feet of roadway. A goal of this paper is to normalize the image of a vehicle so that regardless of its lane or position in a lane, the features will be approximately the same. The wavelet transform itself will not be used directly for recognition. Instead, it will be input to a neural network and the output of the neural network will be one element of the feature set used for recognition

DigitalCommons@Kennesaw State University

Automatic target recognition with convolutional neural networks.

Author: Baili Nada
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/12/2020
Field of study

Automatic Target Recognition (ATR) characterizes the ability for an algorithm or device to identify targets or other objects based on data obtained from sensors, being commonly thermal. ATR is an important technology for both civilian and military computer vision applications. However, the current level of performance that is available is largely deficient compared to the requirements. This is mainly due to the difficulty of acquiring targets in realistic environments, and also to limitations of the distribution of classified data to the academic community for research purposes. This thesis proposes to solve the ATR task using Convolutional Neural Networks (CNN). We present three learning approaches using WideResNet-28-2\cite{wrn} as a backbone CNN. The first method uses random initialization of the network weights. The second method explores transfer learning. Finally, the third approach relies on spatial transformer networks \cite{stn} to enhance the geometric invariance of the model. To validate, analyze and compare our three proposed models, we use a large-scale real benchmark dataset that includes civilian and military vehicles. These targets are captured at different viewing angles, different resolutions, and different times of the day. We evaluate the effectiveness of our methods by studying their robustness to realistic case scenarios where no ground truth data is available and targets are automatically detected. We show that the method that uses spatial transformer networks achieves the best results and demonstrates the most robustness to various perturbations that can be encountered in real applications

University of Louisville