Search CORE

2,130 research outputs found

Recommended from our members

An evaluation framework for stereo-based driver assistance

Author: Banitsas KA
Gehrig S
Pfeiffer D
Schneider N
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This is the post-print version of the Article - Copyright @ 2012 Springer VerlagThe accuracy of stereo algorithms or optical flow methods is commonly assessed by comparing the results against the Middlebury database. However, equivalent data for automotive or robotics applications rarely exist as they are difficult to obtain. As our main contribution, we introduce an evaluation framework tailored for stereo-based driver assistance able to deliver excellent performance measures while circumventing manual label effort. Within this framework one can combine several ways of ground-truthing, different comparison metrics, and use large image databases. Using our framework we show examples on several types of ground truthing techniques: implicit ground truthing (e.g. sequence recorded without a crash occurred), robotic vehicles with high precision sensors, and to a small extent, manual labeling. To show the effectiveness of our evaluation framework we compare three different stereo algorithms on pixel and object level. In more detail we evaluate an intermediate representation called the Stixel World. Besides evaluating the accuracy of the Stixels, we investigate the completeness (equivalent to the detection rate) of the StixelWorld vs. the number of phantom Stixels. Among many findings, using this framework enables us to reduce the number of phantom Stixels by a factor of three compared to the base parametrization. This base parametrization has already been optimized by test driving vehicles for distances exceeding 10000 km

Brunel University Research Archive

Object Detection in 20 Years: A Survey

Author: Guo Yuhong
Shi Zhenwei
Ye Jieping
Zou Zhengxia
Publication venue
Publication date: 15/05/2019
Field of study

Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

arXiv.org e-Print Archive

Big Data for Traffic Estimation and Prediction: A Survey of Data and Tools

Author: Jiang Weiwei
Luo Jiayun
Publication venue
Publication date: 17/03/2021
Field of study

Big data has been used widely in many areas including the transportation industry. Using various data sources, traffic states can be well estimated and further predicted for improving the overall operation efficiency. Combined with this trend, this study presents an up-to-date survey of open data and big data tools used for traffic estimation and prediction. Different data types are categorized and the off-the-shelf tools are introduced. To further promote the use of big data for traffic estimation and prediction tasks, challenges and future directions are given for future studies

arXiv.org e-Print Archive

Directory of Open Access Journals

DR-NTU (Digital Repository of NTU)

Workflow-based Context-aware Control of Surgical Robots

Author: Beyl Tim
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

Surgical assistance system such as medical robots enhanced the capabilities of medical procedures in the last decades. This work presents a new perspective on the use of workflows with surgical robots in order to improve the technical capabilities and the ease of use of such systems. This is accomplished by a 3D perception system for the supervision of the surgical operating room and a workflow-based controller, that allows to monitor the surgical process using workflow-tracking techniques

KITopen

AI and IoT Meet Mobile Machines: Towards a Smart Working Site

Author: Xiang Yusheng
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 04/10/2021
Field of study

Infrastructure construction is society's cornerstone and economics' catalyst. Therefore, improving mobile machinery's efficiency and reducing their cost of use have enormous economic benefits in the vast and growing construction market. In this thesis, I envision a novel concept smart working site to increase productivity through fleet management from multiple aspects and with Artificial Intelligence (AI) and Internet of Things (IoT)

KITopen

Directory of Open Access Books (DOAB)

ModDrop: adaptive multi-modal gesture recognition

Author: Nebout Florian
Neverova Natalia
Taylor Graham W.
Wolf Christian
Publication venue
Publication date: 06/06/2015
Field of study

We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

arXiv.org e-Print Archive

HAL

Hal-Diderot

Facial analysis with depth maps and deep learning

Author: Brito Paulo
Publication venue
Publication date: 20/12/2018
Field of study

Tese de Doutoramento em Ciência e Tecnologia Web em associação com a Universidade de Trás-os-Montes e Alto Douro, apresentada à Universidade AbertaA recolha e análise sequencial de dados multimodais do rosto humano é um problema importante em visão por computador, com aplicações variadas na análise e monitorização médica, entretenimento e segurança. No entanto, devido à natureza do problema, há uma falta de sistemas acessíveis e fáceis de usar, em tempo real, com capacidade de anotações, análise 3d, capacidade de reanalisar e com uma velocidade capaz de detetar padrões faciais em ambientes de trabalho. No âmbito de um esforço contínuo, para desenvolver ferramentas de apoio à monitorização e avaliação de emoções/sinais em ambiente de trabalho, será realizada uma investigação relativa à aplicabilidade de uma abordagem de análise facial para mapear e avaliar os padrões faciais humanos. O objetivo consiste em investigar um conjunto de sistemas e técnicas que possibilitem responder à questão de como usar dados de sensores multimodais para obter um sistema de classificação para identificar padrões faciais. Com isso em mente, foi planeado desenvolver ferramentas para implementar um sistema em tempo real de forma a reconhecer padrões faciais. O desafio é interpretar esses dados de sensores multimodais para classificá-los com algoritmos de aprendizagem profunda e cumprir os seguintes requisitos: capacidade de anotações, análise 3d e capacidade de reanalisar. Além disso, o sistema tem que ser capaze de melhorar continuamente o resultado do modelo de classificação para melhorar e avaliar diferentes padrões do rosto humano. A FACE ANALYSYS, uma ferramenta desenvolvida no contexto desta tese de doutoramento, será complementada por várias aplicações para investigar as relações de vários dados de sensores com estados emocionais/sinais. Este trabalho é útil para desenvolver um sistema de análise adequado para a perceção de grandes quantidades de dados comportamentais.Collecting and analyzing in real time multimodal sensor data of a human face is an important problem in computer vision, with applications in medical and monitoring analysis, entertainment, and security. However, due to the exigent nature of the problem, there is a lack of affordable and easy to use systems, with real time annotations capability, 3d analysis, replay capability and with a frame speed capable of detecting facial patterns in working behavior environments. In the context of an ongoing effort to develop tools to support the monitoring and evaluation of human affective state in working environments, this research will investigate the applicability of a facial analysis approach to map and evaluate human facial patterns. Our objective consists in investigating a set of systems and techniques that make it possible to answer the question regarding how to use multimodal sensor data to obtain a classification system in order to identify facial patterns. With that in mind, it will be developed tools to implement a real-time system in a way that it will be able to recognize facial patterns from 3d data. The challenge is to interpret this multi-modal sensor data to classify it with deep learning algorithms and fulfill the follow requirements: annotations capability, 3d analysis and replay capability. In addition, the system will be able to enhance continuously the output result of the system with a training process in order to improve and evaluate different patterns of the human face. FACE ANALYSYS is a tool developed in the context of this doctoral thesis, in order to research the relations of various sensor data with human facial affective state. This work is useful to develop an appropriate visualization system for better insight of a large amount of behavioral data.N/

Repositório Aberto da Universidade Aberta