354 research outputs found

    Cyclist Detection, Tracking, and Trajectory Analysis in Urban Traffic Video Data

    Full text link
    The major objective of this thesis work is examining computer vision and machine learning detection methods, tracking algorithms and trajectory analysis for cyclists in traffic video data and developing an efficient system for cyclist counting. Due to the growing number of cyclist accidents on urban roads, methods for collecting information on cyclists are of significant importance to the Department of Transportation. The collected information provides insights into solving critical problems related to transportation planning, implementing safety countermeasures, and managing traffic flow efficiently. Intelligent Transportation System (ITS) employs automated tools to collect traffic information from traffic video data. In comparison to other road users, such as cars and pedestrians, the automated cyclist data collection is relatively a new research area. In this work, a vision-based method for gathering cyclist count data at intersections and road segments is developed. First, we develop methodology for an efficient detection and tracking of cyclists. The combination of classification features along with motion based properties are evaluated to detect cyclists in the test video data. A Convolutional Neural Network (CNN) based detector called You Only Look Once (YOLO) is implemented to increase the detection accuracy. In the next step, the detection results are fed into a tracker which is implemented based on the Kernelized Correlation Filters (KCF) which in cooperation with the bipartite graph matching algorithm allows to track multiple cyclists, concurrently. Then, a trajectory rebuilding method and a trajectory comparison model are applied to refine the accuracy of tracking and counting. The trajectory comparison is performed based on semantic similarity approach. The proposed counting method is the first cyclist counting method that has the ability to count cyclists under different movement patterns. The trajectory data obtained can be further utilized for cyclist behavioral modeling and safety analysis

    Learning Representations for Face Recognition: A Review from Holistic to Deep Learning

    Get PDF
    For decades, researchers have investigated how to recognize facial images. This study reviews the development of different face recognition (FR) methods, namely, holistic learning, handcrafted local feature learning, shallow learning, and deep learning (DL). With the development of methods, the accuracy of recognizing faces in the labeled faces in the wild (LFW) database has been increased. The accuracy of holistic learning is 60%, that of handcrafted local feature learning increases to 70%, and that of shallow learning is 86%. Finally, DL achieves human-level performance (97% accuracy). This enhanced accuracy is caused by large datasets and graphics processing units (GPUs) with massively parallel processing capabilities. Furthermore, FR challenges and current research studies are discussed to understand future research directions. The results of this study show that presently the database of labeled faces in the wild has reached 99.85% accuracy

    Detecting pedestrians in surveillance videos based on Convolutional Neural Network and Motion

    Get PDF

    A Taxonomy of Deep Convolutional Neural Nets for Computer Vision

    Get PDF
    Traditional architectures for solving computer vision problems and the degree of success they enjoyed have been heavily reliant on hand-crafted features. However, of late, deep learning techniques have offered a compelling alternative -- that of automatically learning problem-specific features. With this new paradigm, every problem in computer vision is now being re-examined from a deep learning perspective. Therefore, it has become important to understand what kind of deep networks are suitable for a given problem. Although general surveys of this fast-moving paradigm (i.e. deep-networks) exist, a survey specific to computer vision is missing. We specifically consider one form of deep networks widely used in computer vision - convolutional neural networks (CNNs). We start with "AlexNet" as our base CNN and then examine the broad variations proposed over time to suit different applications. We hope that our recipe-style survey will serve as a guide, particularly for novice practitioners intending to use deep-learning techniques for computer vision.Comment: Published in Frontiers in Robotics and AI (http://goo.gl/6691Bm

    On the 3D point cloud for human-pose estimation

    Get PDF
    This thesis aims at investigating methodologies for estimating a human pose from a 3D point cloud that is captured by a static depth sensor. Human-pose estimation (HPE) is important for a range of applications, such as human-robot interaction, healthcare, surveillance, and so forth. Yet, HPE is challenging because of the uncertainty in sensor measurements and the complexity of human poses. In this research, we focus on addressing challenges related to two crucial components in the estimation process, namely, human-pose feature extraction and human-pose modeling. In feature extraction, the main challenge involves reducing feature ambiguity. We propose a 3D-point-cloud feature called viewpoint and shape feature histogram (VISH) to reduce feature ambiguity by capturing geometric properties of the 3D point cloud of a human. The feature extraction consists of three steps: 3D-point-cloud pre-processing, hierarchical structuring, and feature extraction. In the pre-processing step, 3D points corresponding to a human are extracted and outliers from the environment are removed to retain the 3D points of interest. This step is important because it allows us to reduce the number of 3D points by keeping only those points that correspond to the human body for further processing. In the hierarchical structuring, the pre-processed 3D point cloud is partitioned and replicated into a tree structure as nodes. Viewpoint feature histogram (VFH) and shape features are extracted from each node in the tree to provide a descriptor to represent each node. As the features are obtained based on histograms, coarse-level details are highlighted in large regions and fine-level details are highlighted in small regions. Therefore, the features from the point cloud in the tree can capture coarse level to fine level information to reduce feature ambiguity. In human-pose modeling, the main challenges involve reducing the dimensionality of human-pose space and designing appropriate factors that represent the underlying probability distributions for estimating human poses. To reduce the dimensionality, we propose a non-parametric action-mixture model (AMM). It represents high-dimensional human-pose space using low-dimensional manifolds in searching human poses. In each manifold, a probability distribution is estimated based on feature similarity. The distributions in the manifolds are then redistributed according to the stationary distribution of a Markov chain that models the frequency of human actions. After the redistribution, the manifolds are combined according to a probability distribution determined by action classification. Experiments were conducted using VISH features as input to the AMM. The results showed that the overall error and standard deviation of the AMM were reduced by about 7.9% and 7.1%, respectively, compared with a model without action classification. To design appropriate factors, we consider the AMM as a Bayesian network and propose a mapping that converts the Bayesian network to a neural network called NN-AMM. The proposed mapping consists of two steps: structure identification and parameter learning. In structure identification, we have developed a bottom-up approach to build a neural network while preserving the Bayesian-network structure. In parameter learning, we have created a part-based approach to learn synaptic weights by decomposing a neural network into parts. Based on the concept of distributed representation, the NN-AMM is further modified into a scalable neural network called NND-AMM. A neural-network-based system is then built by using VISH features to represent 3D-point-cloud input and the NND-AMM to estimate 3D human poses. The results showed that the proposed mapping can be utilized to design AMM factors automatically. The NND-AMM can provide more accurate human-pose estimates with fewer hidden neurons than both the AMM and NN-AMM can. Both the NN-AMM and NND-AMM can adapt to different types of input, showing the advantage of using neural networks to design factors
    • …
    corecore