85 research outputs found
Video-to-Video Pose and Expression Invariant Face Recognition using Volumetric Directional Pattern
Face recognition in video has attracted attention as a cryptic method of human identification in surveillance systems. In this paper, we propose an end-to-end video face recognition system, addressing a difficult problem of identifying human faces in video due to the presence of large variations in facial pose and expression, and poor video resolution. The proposed descriptor, named Volumetric Directional Pattern (VDP), is an oriented and multi-scale volumetric descriptor that is able to extract and fuse the information of multi frames, temporal (dynamic) information, and multiple poses and expressions of faces in input video to produce feature vectors, which are used to match with all the videos in the database. To make the approach computationally simple and easy to extend, key-frame extraction method is employed.
Therefore, only the frames which contain important information of the video can be used for further processing instead of analyzing all the frames in the video. The performance evaluation of the proposed VDP algorithm is conducted on a publicly available database (YouTube celebrities’ dataset) and observed promising recognition rates
Dense Point-Cloud Representation of a Scene using Monocular Vision
We present a three-dimensional (3-D) reconstruction system designed to support various autonomous navigation applications. The system presented focuses on the 3-D reconstruction of a scene using only a single moving camera. Utilizing video frames captured at different points in time allows us to determine the depths of a scene. In this way, the system can be used to construct a point-cloud model of its unknown surroundings.
We present the step-by-step methodology and analysis used in developing the 3-D reconstruction technique.
We present a reconstruction framework that generates a primitive point cloud, which is computed based on feature matching and depth triangulation analysis. To populate the reconstruction, we utilized optical flow features to create an extremely dense representation model. With the third algorithmic modification, we introduce the addition of the preprocessing step of nonlinear single-image super resolution. With this addition, the depth accuracy of the point cloud, which relies on precise disparity measurement, has significantly increased.
Our final contribution is an additional postprocessing step designed to filter noise points and mismatched features unveiling the complete dense point-cloud representation (DPR) technique. We measure the success of DPR by evaluating the visual appeal, density, accuracy, and computational expense and compare with two state-of-the-art techniques
Histogram of Oriented Phase and Gradient (HOPG) Descriptor for Improved Pedestrian Detection
This paper presents a new pedestrian detection descriptor named Histogram of Oriented Phase and Gradient (HOPG) based on a combination of the Histogram of Oriented Phase (HOP) features and the Histogram of Oriented Gradient features (HOG).
The proposed descriptor extracts the image information using both the gradient and phase congruency concepts. Although the HOG based method has been widely used in the human detection systems, it lacks to deal effectively with the images impacted by the illumination variations and cluttered background. By fusing HOP and HOG features, more structural information can be identified and localized in order to obtain more robust and less sensitive descriptors to lighting variations. The phase congruency information and the gradient of each pixel in the image are extracted with respect to its neighborhood. Histograms of the phase congruency and the gradients of the local segments in the image are computed with respect to its orientations. These histograms are concatenated to construct the HOPG descriptor.
The performance evaluation of the proposed descriptor was performed using INRIA and DaimlerChrysler datasets. A linear support vector machine (SVM) classifier is used to train the pedestrians. The experimental results show that the human detection system based on the proposed features has less error rates and better detection performance over a set of state of the art feature extraction methodologies
Histogram of Oriented Phase (HOP): A New Descriptor Based on Phase Congruency
In this paper we present a low level image descriptor called Histogram of Oriented Phase based on phase congruency concept and the Principal Component Analysis (PCA). Since the phase of the signal conveys more information regarding signal structure than the magnitude, the proposed descriptor can precisely identify and localize image features over the gradient based techniques, especially in the regions affected by illumination changes. The proposed features can be formed by extracting the phase congruency information for each pixel in the image with respect to its neighborhood. Histograms of the phase congruency values of the local regions in the image are computed with respect to its orientation. These histograms are concatenated to construct the Histogram of Oriented Phase (HOP) features. The dimensionality of HOP features is reduced using PCA algorithm to form HOP-PCA descriptor. The dimensionless quantity of the phase congruency leads the HOP-PCA descriptor to be more robust to the image scale variations as well as contrast and illumination changes. Several experiments were performed using INRIA and DaimlerChrysler datasets to evaluate the performance of the HOP-PCA descriptor. The experimental results show that the proposed descriptor has better detection performance and less error rates than a set of the state of the art feature extraction methodologies
Person Identification from Streaming Surveillance Video using Mid-Level Features from Joint Action-Pose Distribution
We propose a real time person identification algorithm for surveillance based scenarios from low-resolution streaming video, based on mid-level features extracted from the joint distribution of various types of human actions and human poses.
The proposed algorithm uses the combination of an auto-encoder based action association framework which produces per-frame probability estimates of the action being performed, and a pose recognition framework which gives per-frame body part locations.
The main focus in this manuscript is to effectively combine these per-frame action probability estimates and pose trajectories from a short temporal window to obtain mid-level features. We demonstrate that these mid-level features captures the variation in the action performed with respect to an individual and can be used to distinguish one person from the next. Preliminary analysis on the KTH action dataset where each sequence is annotated with a specific person and a specific action is provided and shows some interesting results which verify this concept
Scene Projection by Non-Linear Transforms to a Geo-Referenced Map for Situational Awareness
There are many transportation and surveillance cameras currently in use in major cities that are close to the ground and show scenes from a perspective point of view. It can be difficult to follow an object of interest across multiple cameras if many of these cameras are in the same area due to the different orientations of these cameras. This is especially true when compared to wide area aerial surveillance (WAAS).
To correct this problem, this research provides a method to non-linearly transform current camera perspective views into real world coordinates that can be placed on a map. Using a perspective transformation, perspective views are transformed into approximate WAAS views and placed on a map. All images are then on the same plane, allowing a user to follow an object of interest across several camera views on a map. While these transformed images will not fit every feature of the map as WAAS images would, the most important aspects of a scene (i.e. roads, cars, people, sidewalks etc.) are accurate enough to give the user situational awareness.
Our algorithm is proven to be successful when tested on cameras from the downtown area of Dayton, Ohio
Volume Component Analysis for Classification of LiDAR Data
One of the most difficult challenges of working with LiDAR data is the large amount of data points that are produced. Analysing these large data sets is an extremely time consuming process. For this reason, automatic perception of LiDAR scenes is a growing area of research. Currently, most LiDAR feature extraction relies on geometrical features specific to the point cloud of interest. These geometrical features are scene-specific, and often rely on the scale and orientation of the object for classification. This paper proposes a robust method for reduced dimensionality feature extraction of 3D objects using a volume component analysis (VCA) approach.
This VCA approach is based on principal component analysis (PCA). PCA is a method of reduced feature extraction that computes a covariance matrix from the original input vector. The eigenvectors corresponding to the largest eigenvalues of the covariance matrix are used to describe an image. Block-based PCA is an adapted method for feature extraction in facial images because PCA, when performed in local areas of the image, can extract more significant features than can be extracted when the entire image is considered. The image space is split into several of these blocks, and PCA is computed individually for each block.
This VCA proposes that a LiDAR point cloud can be represented as a series of voxels whose values correspond to the point density within that relative location. From this voxelized space, block-based PCA is used to analyze sections of the space where the sections, when combined, will represent features of the entire 3-D object. These features are then used as the input to a support vector machine which is trained to identify four classes of objects, vegetation, vehicles, buildings and barriers with an overall accuracy of 93.8%
- …