17 research outputs found
Fast sky and road detection for video context analysis
Correct interpretation of the events occurring in video has a key role in the improvement of video surveillance systems and the desired automatic decision making. Accurate analysis of the context of the scenes in a video can contribute to the semantic understanding of the video. In this paper, we present our research on context analysis within video sequences focusing on fast automatic detection of sky and road. Regarding road detection, the goal of the present study is to develop a motion-based context analysis to annotate roads and to restrict the computationally heavy search for moving objects to the areas where the motion is detected. Our sky detection approach is adopted from Zafarifar et al. [1]. To evaluate the results, the average Coverability Rate (CR) is used. Results of the road detection algorithm are yielding a CR = 0.97 in a single highway video sequence. Regarding sky detection, we illustrate that our algorithm performs well comparing with [2] showing a CR of 0.98
Fast sky and road detection for video context analysis
Correct interpretation of the events occurring in video has a key role in the improvement of video surveillance systems and the desired automatic decision making. Accurate analysis of the context of the scenes in a video can contribute to the semantic understanding of the video. In this paper, we present our research on context analysis within video sequences focusing on fast automatic detection of sky and road. Regarding road detection, the goal of the present study is to develop a motion-based context analysis to annotate roads and to restrict the computationally heavy search for moving objects to the areas where the motion is detected. Our sky detection approach is adopted from Zafarifar et al. [1]. To evaluate the results, the average Coverability Rate (CR) is used. Results of the road detection algorithm are yielding a CR = 0.97 in a single highway video sequence. Regarding sky detection, we illustrate that our algorithm performs well comparing with [2] showing a CR of 0.98
Context analysis : sky, water and motion
Interpreting the events present in the video is a complex task, and the same gesture or motion can be understood in several ways depending on the context of the event and/or the scene. Therefore the context of the scene can contribute to the semantic understanding of the video. In this paper, we present our research on context analysis on video sequences. By context analysis we mean not only determining the general conditions such as daytime or nighttime, indoor or outdoor environments, but also region labeling [1] and motion analysis of the scene. This paper reports on our research results on sky and water labeling and on motion analysis for determining the context. Later, this can be extended with regions such as roads, greenery, buildings, etc. Experiments based on the above detection techniques show that we achieve results comparable with other state-of-the-art techniques for sky and water detection, although in our case the color information is poor. To evaluate results, we use the Coverability Rate (CR) which measures how much of the true sky or water is detected by the algorithm. The obtained average of CR for water detection is about 96:6% and for sky detection it is about 98%
Fast scene analysis for surveillance & video databases
In professional/consumer domains, video databases are broadly applied, facilitating quick searching by fast region analysis, to provide an indication of the video contents. For realtime and cost-efficient implementations, it is important to develop algorithms with high accuracy and low computational complexity. In this paper, we analyze the accuracy and computational complexity of newly developed approaches for semantic region labeling and salient region detection, which aim at extracting spatial contextual information from a video. Both algorithms are analyzed by their native DSP computations and memory usage to prove their practical feasibility. In the analyzed semantic region labeling approach, color and texture features are combined with their related vertical image position to label the key regions. In the salient region detection approach, a discrete cosine transform (DCT) is employed, since it provides a compact representation of the signal energy and the computation can be implemented at low cost. The techniques are applied to two complex surveillance use cases, moving ships in a harbor region and moving cars in traffic surveillance videos, to improve scene understanding in surveillance videos. Results show that our spatial contextual information methods quantitatively and qualitatively outperform other approaches with up to 22% gain in accuracy, while operating at several times lower complexity
Fast semantic region analysis for surveillance & video databases
Video databases are broadly applied both in consumer and professional domains. The importance of real-time surveillance video monitoring has increased for security reasons, while also for consumers video databases are rapidly growing. Quickly searching in databases is facilitated by region analysis, as it provides an indication for the contents of the video. For real-time and cost-efficient implementations, it is important to develop algorithms with low computational complexity. In this paper, we analyze the complexity of a newly develop semantic region labeling approach [2], e.g. road, sky, etc., which aims at extracting spatial contextual information from a video. In the analyzed semantic region labeling approach, color and texture features are combined with the vertical position to label the key regions. The algorithm is analyzed by its native DSP computations and memory usage to prove its practical feasibility. The analysis results show that the system has a low complexity while offering high-accuracy region labeling. A comparison with the state-of-the-art algorithm convincingly reveals that our system outperforms the state-of-the-art with fewer computations
Fast scene analysis for surveillance & video databases
In professional/consumer domains, video databases are broadly applied, facilitating quick searching by fast region analysis, to provide an indication of the video contents. For realtime and cost-efficient implementations, it is important to develop algorithms with high accuracy and low computational complexity. In this paper, we analyze the accuracy and computational complexity of newly developed approaches for semantic region labeling and salient region detection, which aim at extracting spatial contextual information from a video. Both algorithms are analyzed by their native DSP computations and memory usage to prove their practical feasibility. In the analyzed semantic region labeling approach, color and texture features are combined with their related vertical image position to label the key regions. In the salient region detection approach, a discrete cosine transform (DCT) is employed, since it provides a compact representation of the signal energy and the computation can be implemented at low cost. The techniques are applied to two complex surveillance use cases, moving ships in a harbor region and moving cars in traffic surveillance videos, to improve scene understanding in surveillance videos. Results show that our spatial contextual information methods quantitatively and qualitatively outperform other approaches with up to 22% gain in accuracy, while operating at several times lower complexity
Fast abnormal event detection from video surveillance
Video surveillance systems are becoming in-creasingly important both in private and public environments to monitor activity. In this context, this paper presents a novel block-based approach to detect abnormal situations by analyzing the pixel-wise motion context, as an alternative for the conventional object-based approach. We proceed directly with event characterization at the pixel level, based on motion estimation techniques. Optical flow is used to extract information such as density and velocity of motion. The proposed approach identifies abnormal motion variations in regions of motion activity based on the entropy of Discrete Cosine Transform coefficients. We aim at a simple block- based approach to support a real-time implementation. We will report successful results on the detection of abnormal events in surveillance videos captured at an airport
Context analysis : sky, water and motion
Interpreting the events present in the video is a complex task, and the same gesture or motion can be understood in several ways depending on the context of the event and/or the scene. Therefore the context of the scene can contribute to the semantic understanding of the video. In this paper, we present our research on context analysis on video sequences. By context analysis we mean not only determining the general conditions such as daytime or nighttime, indoor or outdoor environments, but also region labeling [1] and motion analysis of the scene. This paper reports on our research results on sky and water labeling and on motion analysis for determining the context. Later, this can be extended with regions such as roads, greenery, buildings, etc. Experiments based on the above detection techniques show that we achieve results comparable with other state-of-the-art techniques for sky and water detection, although in our case the color information is poor. To evaluate results, we use the Coverability Rate (CR) which measures how much of the true sky or water is detected by the algorithm. The obtained average of CR for water detection is about 96:6% and for sky detection it is about 98%
Context-based region labeling for event detection in surveillance video
Automatic natural scene understanding and annotating regions with semantically meaningful labels, such as road or sky, are key aspects of image and video analysis. The annotation of regions is a considered helpful for improving the object-of-interest detection because the object position in the scene is also exploited. For a reliable model of a scene and associated context information, the labeling task involves image analysis at multiple, both global and local, scene levels. In this paper, we develop a general framework for performing automatic semantic labeling of video scenes by combining the local features and spatial contextual cues. While maintaining a high accuracy, we pursue an algorithm with low computational complexity, so that it is suitable for real-time implementation in embedded video surveillance. We apply our approach to a complex surveillance use case and to three different datasets: WaterVisie [1], LabelMe [2] and our own dataset. We show that our method quantitatively and qualitatively outperforms two sate-of-the-art approaches [3][4]