7,019 research outputs found
Video enhancement using adaptive spatio-temporal connective filter and piecewise mapping
This paper presents a novel video enhancement system based on an adaptive spatio-temporal connective (ASTC) noise filter and an adaptive piecewise mapping function (APMF). For ill-exposed videos or those with much noise, we first introduce a novel local image statistic to identify impulse noise pixels, and then incorporate it into the classical bilateral filter to form ASTC, aiming to reduce the mixture of the most two common types of noises - Gaussian and impulse noises in spatial and temporal directions. After noise removal, we enhance the video contrast with APMF based on the statistical information of frame segmentation results. The experiment results demonstrate that, for diverse low-quality videos corrupted by mixed noise, underexposure, overexposure, or any mixture of the above, the proposed system can automatically produce satisfactory results
Focusing and Compression of Ultrashort Pulses through Scattering Media
Light scattering in inhomogeneous media induces wavefront distortions which
pose an inherent limitation in many optical applications. Examples range from
microscopy and nanosurgery to astronomy. In recent years, ongoing efforts have
made the correction of spatial distortions possible by wavefront shaping
techniques. However, when ultrashort pulses are employed scattering induces
temporal distortions which hinder their use in nonlinear processes such as in
multiphoton microscopy and quantum control experiments. Here we show that
correction of both spatial and temporal distortions can be attained by
manipulating only the spatial degrees of freedom of the incident wavefront.
Moreover, by optimizing a nonlinear signal the refocused pulse can be shorter
than the input pulse. We demonstrate focusing of 100fs pulses through a 1mm
thick brain tissue, and 1000-fold enhancement of a localized two-photon
fluorescence signal. Our results open up new possibilities for optical
manipulation and nonlinear imaging in scattering media
Online real-time crowd behavior detection in video sequences
Automatically detecting events in crowded scenes is a challenging task in Computer Vision. A number of offline approaches have been proposed for solving the problem of crowd behavior detection, however the offline assumption limits their application in real-world video surveillance systems. In this paper, we propose an online and real-time method for detecting events in crowded video sequences. The proposed approach is based on the combination of visual feature extraction and image segmentation and it works without the need of a training phase. A quantitative experimental evaluation has been carried out on multiple publicly available video sequences, containing data from various crowd scenarios and different types of events, to demonstrate the effectiveness of the approach
Locally-adapted convolution-based super-resolution of irregularly-sampled ocean remote sensing data
Super-resolution is a classical problem in image processing, with numerous
applications to remote sensing image enhancement. Here, we address the
super-resolution of irregularly-sampled remote sensing images. Using an optimal
interpolation as the low-resolution reconstruction, we explore locally-adapted
multimodal convolutional models and investigate different dictionary-based
decompositions, namely based on principal component analysis (PCA), sparse
priors and non-negativity constraints. We consider an application to the
reconstruction of sea surface height (SSH) fields from two information sources,
along-track altimeter data and sea surface temperature (SST) data. The reported
experiments demonstrate the relevance of the proposed model, especially
locally-adapted parametrizations with non-negativity constraints, to outperform
optimally-interpolated reconstructions.Comment: 4 pages, 3 figure
Detecting and removing visual distractors for video aesthetic enhancement
Personal videos often contain visual distractors, which are objects that are accidentally captured that can distract viewers from focusing on the main subjects. We propose a method to automatically detect and localize these distractors through learning from a manually labeled dataset. To achieve spatially and temporally coherent detection, we propose extracting features at the Temporal-Superpixel (TSP) level using a traditional SVM-based learning framework. We also experiment with end-to-end learning using Convolutional Neural Networks (CNNs), which achieves slightly higher performance than other methods. The classification result is further refined in a post-processing step based on graph-cut optimization. Experimental results show that our method achieves an accuracy of 81% and a recall of 86%. We demonstrate several ways of removing the detected distractors to improve the video quality, including video hole filling; video frame replacement; and camera path re-planning. The user study results show that our method can significantly improve the aesthetic quality of videos
Spatio-temporal action localization with Deep Learning
Dissertação de mestrado em Engenharia InformáticaThe system that detects and identifies human activities are named human action recognition.
On the video approach, human activity is classified into four different categories, depending
on the complexity of the steps and the number of body parts involved in the action, namely
gestures, actions, interactions, and activities, which is challenging for video Human action
recognition to capture valuable and discriminative features because of the human body’s
variations. So, deep learning techniques have provided practical applications in multiple fields
of signal processing, usually surpassing traditional signal processing on a large scale.
Recently, several applications, namely surveillance, human-computer interaction, and video
recovery based on its content, have studied violence’s detection and recognition. In recent
years there has been a rapid growth in the production and consumption of a wide variety of
video data due to the popularization of high quality and relatively low-price video devices.
Smartphones and digital cameras contributed a lot to this factor. At the same time, there are
about 300 hours of video data updates every minute on YouTube. Along with the growing
production of video data, new technologies such as video captioning, answering video surveys,
and video-based activity/event detection are emerging every day. From the video input data,
the detection of human activity indicates which activity is contained in the video and locates
the regions in the video where the activity occurs.
This dissertation has conducted an experiment to identify and detect violence with spatial action localization, adapting a public dataset for effect. The idea was used an annotated
dataset of general action recognition and adapted only for violence detection.O sistema que deteta e identifica as atividades humanas é denominado reconhecimento da
ação humana. Na abordagem por vídeo, a atividade humana é classificada em quatro
categorias diferentes, dependendo da complexidade das etapas e do número de partes do
corpo envolvidas na ação, a saber, gestos, ações, interações e atividades, o que é desafiador
para o reconhecimento da ação humana do vídeo para capturar características valiosas e
discriminativas devido às variações do corpo humano. Portanto, as técnicas de deep learning
forneceram aplicações práticas em vários campos de processamento de sinal, geralmente
superando o processamento de sinal tradicional em grande escala.
Recentemente, várias aplicações, nomeadamente na vigilância, interação humano computador e recuperação de vídeo com base no seu conteúdo, estudaram a deteção e o
reconhecimento da violência. Nos últimos anos, tem havido um rápido crescimento na
produção e consumo de uma ampla variedade de dados de vídeo devido à popularização de
dispositivos de vídeo de alta qualidade e preços relativamente baixos. Smartphones e cameras
digitais contribuíram muito para esse fator. Ao mesmo tempo, há cerca de 300 horas de
atualizações de dados de vídeo a cada minuto no YouTube. Junto com a produção crescente
de dados de vídeo, novas tecnologias, como legendagem de vídeo, respostas a pesquisas de
vídeo e deteção de eventos / atividades baseadas em vídeo estão surgindo todos os dias. A
partir dos dados de entrada de vídeo, a deteção de atividade humana indica qual atividade
está contida no vídeo e localiza as regiões no vídeo onde a atividade ocorre.
Esta dissertação conduziu uma experiência para identificar e detetar violência com localização
espacial, adaptando um dataset público para efeito. A ideia foi usada um conjunto de dados
anotado de reconhecimento de ações gerais e adaptá-la apenas para deteção de violência
Motion Scalability for Video Coding with Flexible Spatio-Temporal Decompositions
PhDThe research presented in this thesis aims to extend the scalability range of the
wavelet-based video coding systems in order to achieve fully scalable coding with a
wide range of available decoding points. Since the temporal redundancy regularly
comprises the main portion of the global video sequence redundancy, the techniques
that can be generally termed motion decorrelation techniques have a central role in
the overall compression performance. For this reason the scalable motion modelling
and coding are of utmost importance, and specifically, in this thesis possible
solutions are identified and analysed.
The main contributions of the presented research are grouped into two
interrelated and complementary topics. Firstly a flexible motion model with rateoptimised
estimation technique is introduced. The proposed motion model is based
on tree structures and allows high adaptability needed for layered motion coding. The
flexible structure for motion compensation allows for optimisation at different stages
of the adaptive spatio-temporal decomposition, which is crucial for scalable coding
that targets decoding on different resolutions. By utilising an adaptive choice of
wavelet filterbank, the model enables high compression based on efficient mode
selection. Secondly, solutions for scalable motion modelling and coding are
developed. These solutions are based on precision limiting of motion vectors and
creation of a layered motion structure that describes hierarchically coded motion.
The solution based on precision limiting relies on layered bit-plane coding of motion
vector values. The second solution builds on recently established techniques that
impose scalability on a motion structure. The new approach is based on two major
improvements: the evaluation of distortion in temporal Subbands and motion search
in temporal subbands that finds the optimal motion vectors for layered motion
structure.
Exhaustive tests on the rate-distortion performance in demanding scalable video
coding scenarios show benefits of application of both developed flexible motion
model and various solutions for scalable motion coding
- …