833 research outputs found

    Down-Sampling coupled to Elastic Kernel Machines for Efficient Recognition of Isolated Gestures

    Get PDF
    In the field of gestural action recognition, many studies have focused on dimensionality reduction along the spatial axis, to reduce both the variability of gestural sequences expressed in the reduced space, and the computational complexity of their processing. It is noticeable that very few of these methods have explicitly addressed the dimensionality reduction along the time axis. This is however a major issue with regard to the use of elastic distances characterized by a quadratic complexity. To partially fill this apparent gap, we present in this paper an approach based on temporal down-sampling associated to elastic kernel machine learning. We experimentally show, on two data sets that are widely referenced in the domain of human gesture recognition, and very different in terms of quality of motion capture, that it is possible to significantly reduce the number of skeleton frames while maintaining a good recognition rate. The method proves to give satisfactory results at a level currently reached by state-of-the-art methods on these data sets. The computational complexity reduction makes this approach eligible for real-time applications.Comment: ICPR 2014, International Conference on Pattern Recognition, Stockholm : Sweden (2014

    Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

    Full text link
    Beyond the existing single-person and multiple-person human parsing tasks in static images, this paper makes the first attempt to investigate a more realistic video instance-level human parsing that simultaneously segments out each person instance and parses each instance into more fine-grained parts (e.g., head, leg, dress). We introduce a novel Adaptive Temporal Encoding Network (ATEN) that alternatively performs temporal encoding among key frames and flow-guided feature propagation from other consecutive frames between two key frames. Specifically, ATEN first incorporates a Parsing-RCNN to produce the instance-level parsing result for each key frame, which integrates both the global human parsing and instance-level human segmentation into a unified model. To balance between accuracy and efficiency, the flow-guided feature propagation is used to directly parse consecutive frames according to their identified temporal consistency with key frames. On the other hand, ATEN leverages the convolution gated recurrent units (convGRU) to exploit temporal changes over a series of key frames, which are further used to facilitate the frame-level instance-level parsing. By alternatively performing direct feature propagation between consistent frames and temporal encoding network among key frames, our ATEN achieves a good balance between frame-level accuracy and time efficiency, which is a common crucial problem in video object segmentation research. To demonstrate the superiority of our ATEN, extensive experiments are conducted on the most popular video segmentation benchmark (DAVIS) and a newly collected Video Instance-level Parsing (VIP) dataset, which is the first video instance-level human parsing dataset comprised of 404 sequences and over 20k frames with instance-level and pixel-wise annotations.Comment: To appear in ACM MM 2018. Code link: https://github.com/HCPLab-SYSU/ATEN. Dataset link: http://sysu-hcp.net/li

    Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching

    Full text link
    Leveraging on the recent developments in convolutional neural networks (CNNs), matching dense correspondence from a stereo pair has been cast as a learning problem, with performance exceeding traditional approaches. However, it remains challenging to generate high-quality disparities for the inherently ill-posed regions. To tackle this problem, we propose a novel cascade CNN architecture composing of two stages. The first stage advances the recently proposed DispNet by equipping it with extra up-convolution modules, leading to disparity images with more details. The second stage explicitly rectifies the disparity initialized by the first stage; it couples with the first-stage and generates residual signals across multiple scales. The summation of the outputs from the two stages gives the final disparity. As opposed to directly learning the disparity at the second stage, we show that residual learning provides more effective refinement. Moreover, it also benefits the training of the overall cascade network. Experimentation shows that our cascade residual learning scheme provides state-of-the-art performance for matching stereo correspondence. By the time of the submission of this paper, our method ranks first in the KITTI 2015 stereo benchmark, surpassing the prior works by a noteworthy margin.Comment: Accepted at ICCVW 2017. The first two authors contributed equally to this pape

    Super resolution and dynamic range enhancement of image sequences

    Get PDF
    Camera producers try to increase the spatial resolution of a camera by reducing size of sites on sensor array. However, shot noise causes the signal to noise ratio drop as sensor sites get smaller. This fact motivates resolution enhancement to be performed through software. Super resolution (SR) image reconstruction aims to combine degraded images of a scene in order to form an image which has higher resolution than all observations. There is a demand for high resolution images in biomedical imaging, surveillance, aerial/satellite imaging and high-definition TV (HDTV) technology. Although extensive research has been conducted in SR, attention has not been given to increase the resolution of images under illumination changes. In this study, a unique framework is proposed to increase the spatial resolution and dynamic range of a video sequence using Bayesian and Projection onto Convex Sets (POCS) methods. Incorporating camera response function estimation into image reconstruction allows dynamic range enhancement along with spatial resolution improvement. Photometrically varying input images complicate process of projecting observations onto common grid by violating brightness constancy. A contrast invariant feature transform is proposed in this thesis to register input images with high illumination variation. Proposed algorithm increases the repeatability rate of detected features among frames of a video. Repeatability rate is increased by computing the autocorrelation matrix using the gradients of contrast stretched input images. Presented contrast invariant feature detection improves repeatability rate of Harris corner detector around %25 on average. Joint multi-frame demosaicking and resolution enhancement is also investigated in this thesis. Color constancy constraint set is devised and incorporated into POCS framework for increasing resolution of color-filter array sampled images. Proposed method provides fewer demosaicking artifacts compared to existing POCS method and a higher visual quality in final image

    Neural Video Recovery for Cloud Gaming

    Full text link
    Cloud gaming is a multi-billion dollar industry. A client in cloud gaming sends its movement to the game server on the Internet, which renders and transmits the resulting video back. In order to provide a good gaming experience, a latency below 80 ms is required. This means that video rendering, encoding, transmission, decoding, and display have to finish within that time frame, which is especially challenging to achieve due to server overload, network congestion, and losses. In this paper, we propose a new method for recovering lost or corrupted video frames in cloud gaming. Unlike traditional video frame recovery, our approach uses game states to significantly enhance recovery accuracy and utilizes partially decoded frames to recover lost portions. We develop a holistic system that consists of (i) efficiently extracting game states, (ii) modifying H.264 video decoder to generate a mask to indicate which portions of video frames need recovery, and (iii) designing a novel neural network to recover either complete or partial video frames. Our approach is extensively evaluated using iPhone 12 and laptop implementations, and we demonstrate the utility of game states in the game video recovery and the effectiveness of our overall design

    Robust Time Series Dissimilarity Measure for Outlier Detection and Periodicity Detection

    Full text link
    Dynamic time warping (DTW) is an effective dissimilarity measure in many time series applications. Despite its popularity, it is prone to noises and outliers, which leads to singularity problem and bias in the measurement. The time complexity of DTW is quadratic to the length of time series, making it inapplicable in real-time applications. In this paper, we propose a novel time series dissimilarity measure named RobustDTW to reduce the effects of noises and outliers. Specifically, the RobustDTW estimates the trend and optimizes the time warp in an alternating manner by utilizing our designed temporal graph trend filtering. To improve efficiency, we propose a multi-level framework that estimates the trend and the warp function at a lower resolution, and then repeatedly refines them at a higher resolution. Based on the proposed RobustDTW, we further extend it to periodicity detection and outlier time series detection. Experiments on real-world datasets demonstrate the superior performance of RobustDTW compared to DTW variants in both outlier time series detection and periodicity detection

    Condition Monitoring and Management from Acoustic Emissions

    Get PDF
    • …
    corecore