2,750 research outputs found
Video Stream Adaptation In Computer Vision Systems
Computer Vision (CV) has been deployed recently in a wide range of applications, including surveillance and automotive industries. According to a recent report, the market for CV technologies will grow to $33.3 billion by 2019. Surveillance and automotive industries share over 20% of this market. This dissertation considers the design of real-time CV systems with live video streaming, especially those over wireless and mobile networks. Such systems include video cameras/sensors and monitoring stations. The cameras should adapt their captured videos based on the events and/or available resources and time requirement. The monitoring station receives video streams from all cameras and run CV algorithms for decisions, warnings, control, and/or other actions. Real-time CV systems have constraints in power, computational, and communicational resources. Most video adaptation techniques considered the video distortion as the primary metric. In CV systems, however, the main objective is enhancing the event/object detection/recognition/tracking accuracy. The accuracy can essentially be thought of as the quality perceived by machines, as opposed to the human perceptual quality. High-Efficiency Video Coding (HEVC) is a recent encoding standard that seeks to address the limited communication bandwidth problem as a result of the popularity of High Definition (HD) videos. Unfortunately, HEVC adopts algorithms that greatly slow down the encoding process, and thus results in complications in real-time systems.
This dissertation presents a method for adapting live video streams to limited and varying network bandwidth and energy resources. It analyzes and compares the rate-accuracy and rate-energy characteristics of various video streams adaptation techniques in CV systems. We model the video capturing, encoding, and transmission aspects and then provide an overall model of the power consumed by the video cameras and/or sensors. In addition to modeling the power consumption, we model the achieved bitrate of video encoding. We validate and analyze the power consumption models of each phase as well as the aggregate power consumption model through extensive experiments. The analysis includes examining individual parameters separately and examining the impacts of changing more than one parameter at a time. For HEVC, we develop an algorithm that predicts the size of the block without iterating through the exhaustive Rate Distortion Optimization (RDO) method. We demonstrate the effectiveness of the proposed algorithm in comparison with existing algorithms. The proposed algorithm achieves approximately 5 times the encoding speed of the RDO algorithm and 1.42 times the encoding speed of the fastest analyzed algorithm
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
End-to-End Multiview Gesture Recognition for Autonomous Car Parking System
The use of hand gestures can be the most intuitive human-machine interaction medium.
The early approaches for hand gesture recognition used device-based methods. These
methods use mechanical or optical sensors attached to a glove or markers, which hinders
the natural human-machine communication. On the other hand, vision-based methods are
not restrictive and allow for a more spontaneous communication without the need of an
intermediary between human and machine. Therefore, vision gesture recognition has been
a popular area of research for the past thirty years.
Hand gesture recognition finds its application in many areas, particularly the automotive
industry where advanced automotive human-machine interface (HMI) designers are
using gesture recognition to improve driver and vehicle safety. However, technology advances
go beyond active/passive safety and into convenience and comfort. In this context,
one of America’s big three automakers has partnered with the Centre of Pattern Analysis
and Machine Intelligence (CPAMI) at the University of Waterloo to investigate expanding
their product segment through machine learning to provide an increased driver convenience
and comfort with the particular application of hand gesture recognition for autonomous
car parking.
In this thesis, we leverage the state-of-the-art deep learning and optimization techniques
to develop a vision-based multiview dynamic hand gesture recognizer for self-parking system.
We propose a 3DCNN gesture model architecture that we train on a publicly available
hand gesture database. We apply transfer learning methods to fine-tune the pre-trained
gesture model on a custom-made data, which significantly improved the proposed system
performance in real world environment. We adapt the architecture of the end-to-end solution
to expand the state of the art video classifier from a single image as input (fed by
monocular camera) to a multiview 360 feed, offered by a six cameras module. Finally, we
optimize the proposed solution to work on a limited resources embedded platform (Nvidia
Jetson TX2) that is used by automakers for vehicle-based features, without sacrificing the
accuracy robustness and real time functionality of the system
A Review and Analysis of Eye-Gaze Estimation Systems, Algorithms and Performance Evaluation Methods in Consumer Platforms
In this paper a review is presented of the research on eye gaze estimation
techniques and applications, that has progressed in diverse ways over the past
two decades. Several generic eye gaze use-cases are identified: desktop, TV,
head-mounted, automotive and handheld devices. Analysis of the literature leads
to the identification of several platform specific factors that influence gaze
tracking accuracy. A key outcome from this review is the realization of a need
to develop standardized methodologies for performance evaluation of gaze
tracking systems and achieve consistency in their specification and comparative
evaluation. To address this need, the concept of a methodological framework for
practical evaluation of different gaze tracking systems is proposed.Comment: 25 pages, 13 figures, Accepted for publication in IEEE Access in July
201
A Survey on Human-aware Robot Navigation
Intelligent systems are increasingly part of our everyday lives and have been
integrated seamlessly to the point where it is difficult to imagine a world
without them. Physical manifestations of those systems on the other hand, in
the form of embodied agents or robots, have so far been used only for specific
applications and are often limited to functional roles (e.g. in the industry,
entertainment and military fields). Given the current growth and innovation in
the research communities concerned with the topics of robot navigation,
human-robot-interaction and human activity recognition, it seems like this
might soon change. Robots are increasingly easy to obtain and use and the
acceptance of them in general is growing. However, the design of a socially
compliant robot that can function as a companion needs to take various areas of
research into account. This paper is concerned with the navigation aspect of a
socially-compliant robot and provides a survey of existing solutions for the
relevant areas of research as well as an outlook on possible future directions.Comment: Robotics and Autonomous Systems, 202
MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things
The Internet of Things (IoT), the network integrating billions of smart
physical devices embedded with sensors, software, and communication
technologies for the purpose of connecting and exchanging data with other
devices and systems, is a critical and rapidly expanding component of our
modern world. The IoT ecosystem provides a rich source of real-world modalities
such as motion, thermal, geolocation, imaging, depth, sensors, video, and audio
for prediction tasks involving the pose, gaze, activities, and gestures of
humans as well as the touch, contact, pose, 3D of physical objects. Machine
learning presents a rich opportunity to automatically process IoT data at
scale, enabling efficient inference for impact in understanding human
wellbeing, controlling physical devices, and interconnecting smart cities. To
develop machine learning technologies for IoT, this paper proposes MultiIoT,
the most expansive IoT benchmark to date, encompassing over 1.15 million
samples from 12 modalities and 8 tasks. MultiIoT introduces unique challenges
involving (1) learning from many sensory modalities, (2) fine-grained
interactions across long temporal ranges, and (3) extreme heterogeneity due to
unique structure and noise topologies in real-world sensors. We also release a
set of strong modeling baselines, spanning modality and task-specific methods
to multisensory and multitask models to encourage future research in
multisensory representation learning for IoT
Motorcycles that see: Multifocal stereo vision sensor for advanced safety systems in tilting vehicles
Advanced driver assistance systems, ADAS, have shown the possibility to anticipate crash accidents and effectively assist road users in critical traffic situations. This is not the case for motorcyclists, in fact ADAS for motorcycles are still barely developed. Our aim was to study a camera-based sensor for the application of preventive safety in tilting vehicles. We identified two road conflict situations for which automotive remote sensors installed in a tilting vehicle are likely to fail in the identification of critical obstacles. Accordingly, we set two experiments conducted in real traffic conditions to test our stereo vision sensor. Our promising results support the application of this type of sensors for advanced motorcycle safety applications
- …