185 research outputs found
Fireground location understanding by semantic linking of visual objects and building information models
This paper presents an outline for improved localization and situational awareness in fire emergency situations based on semantic technology and computer vision techniques. The novelty of our methodology lies in the semantic linking of video object recognition results from visual and thermal cameras with Building Information Models (BIM). The current limitations and possibilities of certain building information streams in the context of fire safety or fire incident management are addressed in this paper. Furthermore, our data management tools match higher-level semantic metadata descriptors of BIM and deep-learning based visual object recognition and classification networks. Based on these matches, estimations can be generated of camera, objects and event positions in the BIM model, transforming it from a static source of information into a rich, dynamic data provider. Previous work has already investigated the possibilities to link BIM and low-cost point sensors for fireground understanding, but these approaches did not take into account the benefits of video analysis and recent developments in semantics and feature learning research. Finally, the strengths of the proposed approach compared to the state-of-the-art is its (semi -)automatic workflow, generic and modular setup and multi-modal strategy, which allows to automatically create situational awareness, to improve localization and to facilitate the overall fire understanding
Multi-modal video analysis for early fire detection
In dit proefschrift worden verschillende aspecten van een intelligent videogebaseerd branddetectiesysteem onderzocht. In een eerste luik ligt de nadruk op de multimodale verwerking van visuele, infrarood en time-of-flight videobeelden, die de louter visuele detectie verbetert. Om de verwerkingskost zo minimaal mogelijk te houden, met het oog op real-time detectie, is er voor elk van het type sensoren een set ’low-cost’ brandkarakteristieken geselecteerd die vuur en vlammen uniek beschrijven. Door het samenvoegen van de verschillende typen informatie kunnen het aantal gemiste detecties en valse alarmen worden gereduceerd, wat resulteert in een significante verbetering van videogebaseerde branddetectie. Om de multimodale detectieresultaten te kunnen combineren, dienen de multimodale beelden wel geregistreerd (~gealigneerd) te zijn. Het tweede luik van dit proefschrift focust zich hoofdzakelijk op dit samenvoegen van multimodale data en behandelt een nieuwe silhouet gebaseerde registratiemethode. In het derde en tevens laatste luik van dit proefschrift worden methodes voorgesteld om videogebaseerde brandanalyse, en in een latere fase ook brandmodellering, uit te voeren. Elk van de voorgestelde technieken voor multimodale detectie en multi-view lokalisatie zijn uitvoerig getest in de praktijk. Zo werden onder andere succesvolle testen uitgevoerd voor de vroegtijdige detectie van wagenbranden in ondergrondse parkeergarages
Automatic detection, tracking and counting of birds in marine video content
Robust automatic detection of moving objects in a marine context is a multi-faceted problem due to the complexity of the observed scene. The dynamic nature of the sea caused by waves, boat wakes, and weather conditions poses huge challenges for the development of a stable background model. Moreover, camera motion, reflections, lightning and illumination changes may contribute to false detections. Dynamic background subtraction (DBGS) is widely considered as a solution to tackle this issue in the scope of vessel detection for maritime traffic analysis. In this paper, the DBGS techniques suggested for ships are investigated and optimized for the monitoring and tracking of birds in marine video content. In addition to background subtraction, foreground candidates are filtered by a classifier based on their feature descriptors in order to remove non-bird objects. Different types of classifiers have been evaluated and results on a ground truth labeled dataset of challenging video fragments show similar levels of precision and recall of about 95% for the best performing classifier. The remaining foreground items are counted and birds are tracked along the video sequence using spatio-temporal motion prediction. This allows marine scientists to study the presence and behavior of birds
Spott : on-the-spot e-commerce for television using deep learning-based video analysis techniques
Spott is an innovative second screen mobile multimedia application which offers viewers relevant information on objects (e.g., clothing, furniture, food) they see and like on their television screens. The application enables interaction between TV audiences and brands, so producers and advertisers can offer potential consumers tailored promotions, e-shop items, and/or free samples. In line with the current views on innovation management, the technological excellence of the Spott application is coupled with iterative user involvement throughout the entire development process. This article discusses both of these aspects and how they impact each other. First, we focus on the technological building blocks that facilitate the (semi-) automatic interactive tagging process of objects in the video streams. The majority of these building blocks extensively make use of novel and state-of-the-art deep learning concepts and methodologies. We show how these deep learning based video analysis techniques facilitate video summarization, semantic keyframe clustering, and (similar) object retrieval. Secondly, we provide insights in user tests that have been performed to evaluate and optimize the application's user experience. The lessons learned from these open field tests have already been an essential input in the technology development and will further shape the future modifications to the Spott application
SmarterRoutes : data-driven road complexity estimation for level-of-detail adaptation of navigation services
SmarterRoutes aims to improve navigational services and make them more dynamic and personalised by data-driven and environmentally-aware road scene complexity estimation. SmarterRoutes divides complexity into two subtypes: perceived and descriptive complexity. In the SmarterRoutes architecture, the overall road scene complexity is indicated by combining and merging parameters from both types of complexity. Descriptive complexity is derived from geospatial data sources, traffic data and sensor analysis. The architecture is currently using OpenStreetMap (OSM) tag analysis, Meten-In-Vlaanderen (MIV) derived traffic info and the Alaro weather model of the Royal Meteorological Institute of Belgium (RMI) as descriptive complexity indicators. For the perceived complexity an image based complexity estimation mechanism is presented. This image based Densenet Convolutional Neural Network (CNN) uses Street View images as input and was pretrained on buildings with Bag-of-Words and Structure-from-motion features. The model calculates an image descriptor allowing comparison of images by calculation of the Euclidean distances between descriptors. SmarterRoutes extends this model by additional hand-labelled rankings of road scene images to predict visual road complexity. The reuse of an existing pretrained model with an additional ranking mechanism produces results corresponding with subjective assessments of end-users. Finally, the global complexity mechanism combines the aforementioned sub-mechanisms and produces a service which should facilitate user-centred context-aware navigation by intelligent data selection and/or omission based on SmarterRoutes’ complexity input
- …