27,183 research outputs found
STV-based Video Feature Processing for Action Recognition
In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end
Fully automated segmentation and tracking of the intima media thickness in ultrasound video sequences of the common carotid artery
Abstract—The robust identification and measurement of the intima media thickness (IMT) has a high clinical relevance because it represents one of the most precise predictors used in the assessment of potential future cardiovascular events. To facilitate the analysis of arterial wall thickening in serial clinical investigations, in this paper we have developed a novel fully automatic algorithm for the segmentation, measurement, and tracking of the intima media complex (IMC) in B-mode ultrasound video sequences. The proposed algorithm entails a two-stage image analysis process that initially addresses the segmentation of the IMC in the first frame of the ultrasound video sequence using a model-based approach; in the second step, a novel customized tracking procedure is applied to robustly detect the IMC in the subsequent frames. For the video tracking procedure, we introduce a spatially coherent algorithm called adaptive normalized correlation that prevents the tracking process from converging to wrong arterial interfaces. This represents the main contribution of this paper and was developed to deal with inconsistencies in the appearance of the IMC over the cardiac cycle. The quantitative evaluation has been carried out on 40 ultrasound video sequences of the common carotid artery (CCA) by comparing the results returned by the developed algorithm with respect to ground truth data that has been manually annotated by clinical experts. The measured IMTmean ± standard deviation recorded by the proposed algorithm is 0.60 mm ± 0.10, with a mean coefficient of variation (CV) of 2.05%, whereas the corresponding result obtained for the manually annotated ground truth data is 0.60 mm ± 0.11 with a mean CV equal to 5.60%. The numerical results reported in this paper indicate that the proposed algorithm is able to correctly segment and track the IMC in ultrasound CCA video sequences, and we were encouraged by the stability of our technique when applied to data captured under different imaging conditions. Future clinical studies will focus on the evaluation of patients that are affected by advanced cardiovascular conditions such as focal thickening and arterial plaques
Multi-stream CNN based Video Semantic Segmentation for Automated Driving
Majority of semantic segmentation algorithms operate on a single frame even
in the case of videos. In this work, the goal is to exploit temporal
information within the algorithm model for leveraging motion cues and temporal
consistency. We propose two simple high-level architectures based on Recurrent
FCN (RFCN) and Multi-Stream FCN (MSFCN) networks. In case of RFCN, a recurrent
network namely LSTM is inserted between the encoder and decoder. MSFCN combines
the encoders of different frames into a fused encoder via 1x1 channel-wise
convolution. We use a ResNet50 network as the baseline encoder and construct
three networks namely MSFCN of order 2 & 3 and RFCN of order 2. MSFCN-3
produces the best results with an accuracy improvement of 9% and 15% for
Highway and New York-like city scenarios in the SYNTHIA-CVPR'16 dataset using
mean IoU metric. MSFCN-3 also produced 11% and 6% for SegTrack V2 and DAVIS
datasets over the baseline FCN network. We also designed an efficient version
of MSFCN-2 and RFCN-2 using weight sharing among the two encoders. The
efficient MSFCN-2 provided an improvement of 11% and 5% for KITTI and SYNTHIA
with negligible increase in computational complexity compared to the baseline
version.Comment: Accepted for Oral Presentation at VISAPP 201
Generalized Boundaries from Multiple Image Interpretations
Boundary detection is essential for a variety of computer vision tasks such
as segmentation and recognition. In this paper we propose a unified formulation
and a novel algorithm that are applicable to the detection of different types
of boundaries, such as intensity edges, occlusion boundaries or object category
specific boundaries. Our formulation leads to a simple method with
state-of-the-art performance and significantly lower computational cost than
existing methods. We evaluate our algorithm on different types of boundaries,
from low-level boundaries extracted in natural images, to occlusion boundaries
obtained using motion cues and RGB-D cameras, to boundaries from
soft-segmentation. We also propose a novel method for figure/ground
soft-segmentation that can be used in conjunction with our boundary detection
method and improve its accuracy at almost no extra computational cost
Egocentric Hand Detection Via Dynamic Region Growing
Egocentric videos, which mainly record the activities carried out by the
users of the wearable cameras, have drawn much research attentions in recent
years. Due to its lengthy content, a large number of ego-related applications
have been developed to abstract the captured videos. As the users are
accustomed to interacting with the target objects using their own hands while
their hands usually appear within their visual fields during the interaction,
an egocentric hand detection step is involved in tasks like gesture
recognition, action recognition and social interaction understanding. In this
work, we propose a dynamic region growing approach for hand region detection in
egocentric videos, by jointly considering hand-related motion and egocentric
cues. We first determine seed regions that most likely belong to the hand, by
analyzing the motion patterns across successive frames. The hand regions can
then be located by extending from the seed regions, according to the scores
computed for the adjacent superpixels. These scores are derived from four
egocentric cues: contrast, location, position consistency and appearance
continuity. We discuss how to apply the proposed method in real-life scenarios,
where multiple hands irregularly appear and disappear from the videos.
Experimental results on public datasets show that the proposed method achieves
superior performance compared with the state-of-the-art methods, especially in
complicated scenarios
- …