683 research outputs found

    Artificial intelligence enabled automatic traffic monitoring system

    Get PDF
    The rapid advancement in the field of machine learning and high-performance computing have highly augmented the scope of video-based traffic monitoring systems. In this study, an automatic traffic monitoring system is proposed that deploys several state-of-the-art deep learning algorithms based on the nature of traffic operation. Taking advantage of a large database of annotated video surveillance data, deep learning-based models are trained to track congestion, detect traffic anomalies and tabulate vehicle counts. To monitor traffic queues, this study implements a Mask region-based convolutional neural network (Mask R-CNN) that predicts congestion using pixel-level segmentation masks on classified regions of interest. Similarly, the model was used to accurately extract traffic queue-related information from infrastructure mounted video cameras. The use of infrastructure-mounted CCTV cameras for traffic anomaly detection and verification is further explored. Initially, a convolutional neural network model based on you only look once (YOLO), a popular deep learning framework for object detection and classification is deployed. The following identification model, together with a multi-object tracking system (based on intersection over union -- IOU) is used to search for and scrutinize various traffic scenes for possible anomalies. Likewise, several experiments were conducted to fine-tune the system's robustness in different environmental and traffic conditions. Some of the techniques such as bounding box suppression and adaptive thresholding were used to reduce false alarm rates and refine the robustness of the methodology developed. At each stage of our developments, a comparative analysis is conducted to evaluate the strengths and limitations of the proposed approach. Likewise, IOU tracker coupled with YOLO was used to automatically count the number of vehicles whose accuracy was later compared with a manual counting technique from CCTV video feeds. Overall, the proposed system is evaluated based on F1 and S3 performance metrics. The outcome of this study could be seamlessly integrated into traffic system such as smart traffic surveillance system, traffic volume estimation system, smart work zone management systems, etc.by Vishal MandalIncludes bibliographical reference

    Deep Learning Detection in the Visible and Radio Spectrums

    Get PDF
    Deep learning models with convolutional neural networks are being used to solve some of the most difficult problems in computing today. Complicating factors to the use and development of deep learning models include lack of availability of large volumes of data, lack of problem specific samples, and the lack variations in the specific samples available. The costs to collect this data and to compute the models for the task of detection remains a inhibitory condition for all but the most well funded organizations. This thesis seeks to approach deep learning from a cost reduction and hybrid perspective — incorporating techniques of transfer learning, training augmentation, synthetic data generation, morphological computations, as well as statistical and thresholding model fusion — in the task of detection in two domains: visible spectrum detection of target spacecraft, and radio spectrum detection of radio frequency interference in 2D astronomical time-frequency data. The effects of training augmentation on object detection performance is studied in the visible spectrum, as well as the effect of image degradation on detection performance. Supplementing training on degraded images significantly improves the detection results, and in scenarios with low factors of degradation, the baseline results are exceeded. Morphological operations on degraded data shows promise in reducing computational requirements in some detection tasks. The proposed Mask R-CNN model is able to detect and localize properly on spacecraft images degraded by high levels of pixel loss. Deep learning models such as U-Net have been leveraged for the task of radio frequency interference labeling (flagging). Model variations on U-Net architecture design such as layer size and composition are continuing to be explored, however, the examination of deep learning models combined with statistical tests and thresholding techniques for radio frequency interference mitigation is in its infancy. For the radio spectrum domain, the use of the U-Net model combined with various statistical tests and the SumThreshold technique in an output fusion model is tested against a baseline of SumThreshold alone, for the detection of radio frequency interference. This thesis also contributes an improved dataset for spacecraft detection, and a simple technique for the generation of synthetic channelized voltage data for simulating radio astronomy spectra recordings in a 2D time-frequency plot

    Cooperative object classification for driving applications

    Get PDF
    3D object classification can be realised by rendering views of the same object from different angles and aggregating all the views to build a classifier. Although this approach has been previously proposed for general objects classification, most existing works did not consider visual impairments. In contrast, this paper considers the problem of 3D object classification for driving applications under impairments (e.g. occlusion and sensor noise) by generating an application-specific dataset. We present a cooperative object classification method where multiple images of the same object seen from different perspectives (agents) are exploited to generate more accurate classification. We consider model generalisation capability and its resilience to impairments. We introduce an occlusion model with higher resemblance to real-world occlusion and use a simplified sensor noise model. The experimental results show that the cooperative model, relying on multiple views, significantly outperforms single-view methods and is effective in mitigating the effects of occlusion and sensor noise

    A Survey on Computer Vision based Human Analysis in the COVID-19 Era

    Full text link
    The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks. Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given.Comment: Submitted to Image and Vision Computing, 44 pages, 7 figure

    Cross-domain self-supervised complete geometric representation learning for real-scanned point cloud based pathological gait analysis

    Get PDF
    Accurate lower-limb pose estimation is a prereq-uisite of skeleton based pathological gait analysis. To achievethis goal in free-living environments for long-term monitoring,single depth sensor has been proposed in research. However,the depth map acquired from a single viewpoint encodes onlypartial geometric information of the lower limbs and exhibitslarge variations across different viewpoints. Existing off-the-shelfthree-dimensional (3D) pose tracking algorithms and publicdatasets for depth based human pose estimation are mainlytargeted at activity recognition applications. They are relativelyinsensitive to skeleton estimation accuracy, especially at thefoot segments. Furthermore, acquiring ground truth skeletondata for detailed biomechanics analysis also requires consid-erable efforts. To address these issues, we propose a novelcross-domain self-supervised complete geometric representationlearning framework, with knowledge transfer from the unlabelledsynthetic point clouds of full lower-limb surfaces. The proposedmethod can significantly reduce the number of ground truthskeletons (with only 1%) in the training phase, meanwhileensuring accurate and precise pose estimation and capturingdiscriminative features across different pathological gait patternscompared to other methods

    State of the Art on Neural Rendering

    Get PDF
    Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. This state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems

    Explorative Study on Asymmetric Sketch Interactions for Object Retrieval in Virtual Reality

    Get PDF
    Drawing tools for Virtual Reality (VR) enable users to model 3D designs from within the virtual environment itself. These tools employ sketching and sculpting techniques known from desktop-based interfaces and apply them to hand-based controller interaction. While these techniques allow for mid-air sketching of basic shapes, it remains difficult for users to create detailed and comprehensive 3D models. Our work focuses on supporting the user in designing the virtual environment around them by enhancing sketch-based interfaces with a supporting system for interactive model retrieval. An immersed user can query a database containing detailed 3D models and replace them with the virtual environment through sketching. To understand supportive sketching within a virtual environment, we made an explorative comparison between asymmetric methods of sketch interaction, i.e., 3D mid-air sketching, 2D sketching on a virtual tablet, 2D sketching on a fixed virtual whiteboard, and 2D sketching on a real tablet. Our work shows that different patterns emerge when users interact with 3D sketches rather than 2D sketches to compensate for different results from the retrieval system. In particular, the user adopts strategies when drawing on canvas of different sizes or using a physical device instead of a virtual canvas. While we pose our work as a retrieval problem for 3D models of chairs, our results can be extrapolated to other sketching tasks for virtual environments

    On robustness of cloud speech APIs: An early characterization

    Get PDF
    The robustness and consistency of sensory inference models under changing environmental conditions and hardware is a crucial requirement for the generalizability of recent innovative work, particularly in the field of deep learning, from the lab to the real world. We measure the extent to which current speech recognition cloud models are robust to background noise, and show that hardware variability is still a problem for real-world applicability of state-of-the-art speech recognition models

    Thermal Image Based Navigation System for Skid-Steering Mobile Robots in Sugarcane Crops

    Get PDF
    This work proposes a new strategy for autonomous navigation of mobile robots in sugarcane plantations based on thermal imaging. Unlike ordinary agricultural fields, sugarcane farms are generally vast and accommodates numerous arrangements of row crop tunnels, which are very tall, dense and hard-to-access. Moreover, sugarcane crops lie in harsh regions, which hinder the logistics for employing staff and heavy machinery for mapping, monitoring, and sampling. One solution for this problem is TIBA (Tankette for Intelligent BioEnergy Agriculture), a low-cost skid-steering mobile robot capable of infiltrating the crop tunnels with several sensing/sampling systems. The project concept is to reduce the product cost for making the deployment of a robot swarm feasible over a larger area. A prototype was built and tested in a bioenergy farm in order to improve the understanding of the environment and bring about the challenges for the next development steps. The major problem is the navigation through the crop tunnels, since most of the developed systems are suitable for open field operations and employ laser scanners and/or GPS/IMU, which in general are expensive technologies. In this context, we propose a low-cost solution based on infrared (IR) thermal imaging. IR cameras are simple and inexpensive devices, which do not pose risks to the user health, unlike laser-based sensors. This idea was highly motivated by the data collected in the field, which have shown a significant temperature difference between the ground and the crop. From the image analysis, it is possible to clearly visualize a distinguishable corridor and, consequently, generate a straight path for the robot to follow by using computationally efficient approaches. A rigorous analysis of the collected thermal data, numerical simulations and preliminary experiments in the real environment were included to illustrate the efficiency and feasibility of the proposed navigation methodology
    • …
    corecore