11 research outputs found
Group Action Recognition Using Space-Time Interest Points
Abstract. Group action recognition is a challenging task in computer vision due to the large complexity induced by multiple motion patterns. This paper aims at analyzing group actions in video clips containing sev-eral activities. We combine the probability summation framework with the space-time (ST) interest points for this task. First, ST interest points are extracted from video clips to form the feature space. Then we use k-means for feature clustering and build a compact representation, which is then used for group action classification. The proposed approach has been applied to classification tasks including four classes: badminton, tennis, basketball, and soccer videos. The experimental results demon-strate the advantages of the proposed approach.
Sign Language Recognition: Working with Limited Corpora
The availability of video format sign language corpora limited. This leads to a desire for techniques which do not rely on large, fully-labelled datasets. This paper covers various methods for learning sign either from small data sets or from those without ground truth labels. To avoid non-trivial tracking issues; sign detection is investigated using volumetric spatio-temporal features. Following this the advantages of recognising the component parts of sign rather than the signs themselves is demonstrated and finally the idea of using a weakly labelled data set is considered and results shown for work in this area
Scale Invariant Action Recognition Using Compound Features Mined from Dense Spatio-temporal Corners
The use of sparse invariant features to recognise classes of actions or objects has become common in the literature. However, features are often ”engineered” to be both sparse and invariant to transformation and it is assumed that they provide the greatest discriminative information. To tackle activity recognition, we propose learning compound features that are assembled from simple 2D corners in both space and time. Each corner is encoded in relation to its neighbours and from an over complete set (in excess of 1 million possible features), compound features are extracted using data mining. The final classifier, consisting of sets of compound features, can then be applied to recognise and localise an activity in real-time while providing superior performance to other state-of-the-art approaches (including those based upon sparse feature detectors). Furthermore, the approach requires only weak supervision in the form of class labels for each training sequence. No ground truth position or temporal alignment is required during training
Decentralized Sensor Fusion for Ubiquitous Networking Robotics in Urban Areas
In this article we explain the architecture for the environment and sensors that has been built for the European project URUS (Ubiquitous Networking Robotics in Urban Sites), a project whose objective is to develop an adaptable network robot architecture for cooperation between network robots and human beings and/or the environment in urban areas. The project goal is to deploy a team of robots in an urban area to give a set of services to a user community. This paper addresses the sensor architecture devised for URUS and the type of robots and sensors used, including environment sensors and sensors onboard the robots. Furthermore, we also explain how sensor fusion takes place to achieve urban outdoor execution of robotic services. Finally some results of the project related to the sensor network are highlighted
New human action recognition scheme with geometrical feature representation and invariant discretization for video surveillance
Human action recognition is an active research area in computer vision because of its immense application in the field of video surveillance, video retrieval, security systems, video indexing and human computer interaction. Action recognition is classified as the time varying feature data generated by human under different viewpoint that aims to build mapping between dynamic image information and semantic understanding. Although a great deal of progress has been made in recognition of human actions during last two decades, few proposed approaches in literature are reported. This leads to a need for much research works to be conducted in addressing on going challenges leading to developing more efficient approaches to solve human action recognition. Feature extraction is the main tasks in action recognition that represents the core of any action recognition procedure. The process of feature extraction involves transforming the input data that describe the shape of a segmented silhouette of a moving person into the set of represented features of action poses. In video surveillance, global moment invariant based on Geometrical Moment Invariant (GMI) is widely used in human action recognition. However, there are many drawbacks of GMI such that it lack of granular interpretation of the invariants relative to the shape. Consequently, the representation of features has not been standardized. Hence, this study proposes a new scheme of human action recognition (HAR) with geometrical moment invariants for feature extraction and supervised invariant discretization in identifying actions uniqueness in video sequencing. The proposed scheme is tested using IXMAS dataset in video sequence that has non rigid nature of human poses that resulting from drastic illumination changes, changing in pose and erratic motion patterns. The invarianceness of the proposed scheme is validated based on the intra-class and inter-class analysis. The result of the proposed scheme yields better performance in action recognition compared to the conventional scheme with an average of more than 99% accuracy while preserving the shape of the human actions in video images
Reconocimiento de acciones cotidianas
The proposed method consists of three parts: features extraction, the
use of bag of words and classification. For the first stage, we use the STIP
descriptor for the intensity channel and HOG descriptor for the depth channel,
MFCC and Spectrogram for the audio channel. In the next stage, it was
used the bag of words approach in each type of information separately. We
use the K-means algorithm to generate the dictionary. Finally, a SVM classi
fier labels the visual word histograms. For the experiments, we manually
segmented the videos in clips containing a single action, achieving a recognition
rate of 94.4% on Kitchen-UCSP dataset, our own dataset and a
recognition rate of 88% on HMA videos.Trabajo de investigació