2,130 research outputs found
Recommended from our members
An evaluation framework for stereo-based driver assistance
This is the post-print version of the Article - Copyright @ 2012 Springer VerlagThe accuracy of stereo algorithms or optical flow methods is commonly assessed by comparing the results against the Middlebury
database. However, equivalent data for automotive or robotics applications
rarely exist as they are difficult to obtain. As our main contribution, we introduce an evaluation framework tailored for stereo-based driver assistance able to deliver excellent performance measures while
circumventing manual label effort. Within this framework one can combine several ways of ground-truthing, different comparison metrics, and use large image databases.
Using our framework we show examples on several types of ground truthing techniques: implicit ground truthing (e.g. sequence recorded without a crash occurred), robotic vehicles with high precision sensors, and to a small extent, manual labeling. To show the effectiveness of our evaluation framework we compare three different stereo algorithms on
pixel and object level. In more detail we evaluate an intermediate representation
called the Stixel World. Besides evaluating the accuracy of the Stixels, we investigate the completeness (equivalent to the detection rate) of the StixelWorld vs. the number of phantom Stixels. Among many findings, using this framework enables us to reduce the number of phantom Stixels by a factor of three compared to the base parametrization. This base parametrization has already been optimized by test driving vehicles for distances exceeding 10000 km
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Big Data for Traffic Estimation and Prediction: A Survey of Data and Tools
Big data has been used widely in many areas including the transportation
industry. Using various data sources, traffic states can be well estimated and
further predicted for improving the overall operation efficiency. Combined with
this trend, this study presents an up-to-date survey of open data and big data
tools used for traffic estimation and prediction. Different data types are
categorized and the off-the-shelf tools are introduced. To further promote the
use of big data for traffic estimation and prediction tasks, challenges and
future directions are given for future studies
Workflow-based Context-aware Control of Surgical Robots
Surgical assistance system such as medical robots enhanced the capabilities of medical procedures in the last decades. This work presents a new perspective on the use of workflows with surgical robots in order to improve the technical capabilities and the ease of use of such systems. This is accomplished by a 3D perception system for the supervision of the surgical operating room and a workflow-based controller, that allows to monitor the surgical process using workflow-tracking techniques
AI and IoT Meet Mobile Machines: Towards a Smart Working Site
Infrastructure construction is society's cornerstone and economics' catalyst. Therefore, improving mobile machinery's efficiency and reducing their cost of use have enormous economic benefits in the vast and growing construction market. In this thesis, I envision a novel concept smart working site to increase productivity through fleet management from multiple aspects and with Artificial Intelligence (AI) and Internet of Things (IoT)
ModDrop: adaptive multi-modal gesture recognition
We present a method for gesture detection and localisation based on
multi-scale and multi-modal deep learning. Each visual modality captures
spatial information at a particular spatial scale (such as motion of the upper
body or a hand), and the whole system operates at three temporal scales. Key to
our technique is a training strategy which exploits: i) careful initialization
of individual modalities; and ii) gradual fusion involving random dropping of
separate channels (dubbed ModDrop) for learning cross-modality correlations
while preserving uniqueness of each modality-specific representation. We
present experiments on the ChaLearn 2014 Looking at People Challenge gesture
recognition track, in which we placed first out of 17 teams. Fusing multiple
modalities at several spatial and temporal scales leads to a significant
increase in recognition rates, allowing the model to compensate for errors of
the individual classifiers as well as noise in the separate channels.
Futhermore, the proposed ModDrop training technique ensures robustness of the
classifier to missing signals in one or several channels to produce meaningful
predictions from any number of available modalities. In addition, we
demonstrate the applicability of the proposed fusion scheme to modalities of
arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure
Facial analysis with depth maps and deep learning
Tese de Doutoramento em Ciência e Tecnologia Web em associação com a Universidade de Trás-os-Montes e Alto Douro, apresentada à Universidade AbertaA recolha e análise sequencial de dados multimodais do rosto humano é um problema importante em visão por computador, com aplicações variadas na análise e monitorização médica, entretenimento e segurança. No entanto, devido à natureza do problema, há uma falta de sistemas acessíveis e fáceis de usar, em tempo real, com capacidade de anotações, análise 3d, capacidade de reanalisar e com uma velocidade capaz de detetar padrões faciais em ambientes de trabalho. No âmbito de um esforço contínuo, para desenvolver ferramentas de apoio à monitorização e avaliação de emoções/sinais em ambiente de trabalho, será realizada uma investigação relativa à aplicabilidade de uma abordagem de análise facial para mapear e avaliar os padrões
faciais humanos. O objetivo consiste em investigar um conjunto de sistemas e técnicas que possibilitem responder à questão de como usar dados de sensores multimodais para obter um sistema de classificação para identificar padrões faciais. Com isso em mente, foi planeado desenvolver ferramentas para implementar um sistema em tempo real de forma a reconhecer padrões faciais. O desafio é interpretar esses dados de sensores multimodais para classificá-los com algoritmos de aprendizagem profunda e cumprir os seguintes requisitos: capacidade de anotações, análise 3d e capacidade de reanalisar.
Além disso, o sistema tem que ser capaze de melhorar continuamente o resultado do modelo de classificação para melhorar e avaliar diferentes padrões do rosto humano. A FACE ANALYSYS, uma ferramenta desenvolvida no contexto desta tese de doutoramento, será complementada por várias aplicações para investigar as relações de vários dados de sensores com estados emocionais/sinais. Este trabalho é útil para
desenvolver um sistema de análise adequado para a perceção de grandes quantidades de dados comportamentais.Collecting and analyzing in real time multimodal sensor data of a human face is an important problem in computer vision, with applications in medical and monitoring analysis, entertainment, and security. However, due to the exigent nature of the problem, there is a lack of affordable and easy to use systems, with real time annotations capability, 3d analysis, replay capability and with a frame speed capable of detecting facial patterns in working behavior environments. In the context of an ongoing effort to develop tools to support the monitoring and evaluation of human affective state in working environments, this research will investigate the applicability of a facial analysis approach to map and evaluate human facial patterns. Our objective consists in investigating a set of systems and techniques that make it possible to answer the question regarding how to use multimodal sensor data to obtain a classification system in order to identify facial patterns. With that in mind, it will be developed tools to implement a real-time system in a way that it will be able to recognize facial patterns
from 3d data. The challenge is to interpret this multi-modal sensor data to classify it with deep learning algorithms and fulfill the follow requirements: annotations capability, 3d analysis and replay capability. In addition, the system will be able to enhance continuously the output result of the system with a training process in order to improve and evaluate different patterns of the human face. FACE ANALYSYS is a tool developed
in the context of this doctoral thesis, in order to research the relations of various sensor data with human facial affective state. This work is useful to develop an appropriate visualization system for better insight of a large amount of behavioral data.N/
- …