230 research outputs found
You Don't See What I See:Individual Differences in the Perception of Meaning from Visual Stimuli
Everyone has their own unique version of the visual world and there has been growing interest in understanding the way that personality shapes one's perception. Here, we investigated meaningful visual experiences in relation to the personality dimension of schizotypy. In a novel approach to this issue, a non-clinical sample of subjects (total n = 197) were presented with calibrated images of scenes, cartoons and faces of varying visibility embedded in noise; the spatial properties of the images were constructed to mimic the natural statistics of the environment. In two experiments, subjects were required to indicate what they saw in a large number of unique images, both with and without actual meaningful structure. The first experiment employed an open-ended response paradigm and used a variety of different images in noise; the second experiment only presented a series of faces embedded in noise, and required a forced-choice response from the subjects. The results in all conditions indicated that a high positive schizotypy score was associated with an increased tendency to perceive complex meaning in images comprised purely of random visual noise. Individuals high in positive schizotypy seemed to be employing a looser criterion (response bias) to determine what constituted a 'meaningful' image, while also being significantly less sensitive at the task than those low in positive schizotypy. Our results suggest that differences in perceptual performance for individuals high in positive schizotypy are not related to increased suggestibility or susceptibility to instruction, as had previously been suggested. Instead, the observed reductions in sensitivity along with increased response bias toward seeing something that is not there, indirectly implicated subtle neurophysiological differences associated with the personality dimension of schizotypy, that are theoretically pertinent to the continuum of schizophrenia and hallucination-proneness
Anomaly detection in moving-camera videos with sparse and low-rank matrix decompositions
This work presents two methods based on sparse decompositions that can detect anomalies in video sequences obtained from moving cameras. The first method starts by computing the union of subspaces (UoS) that best represents all the frames from a reference (anomaly-free) video as a low-rank projection plus a sparse residue. Then it performs a low-rank representation of the target (possibly anomalous) video by taking advantage of both the UoS and the sparse residue computed from the reference video. The anomalies are extracted after post-processing this video with these residual data. Such algorithm provides good detection results while at the same time obviating the need for previous video synchronization. However, this technique looses its detection efficiency when target and reference videos presents more severe misalignments. This may happen due to small uncontrolled camera moviment and shaking during the acquisition phase, which is often common in realworld situations. To extend its applicability, a second contribution is proposed in order to cope with these possible pose misalignments. This is done by modeling the target-reference pose discrepancy as geometric transformations acting on the domain of frames of the target video. A complete matrix decomposition algorithm is presented in order to perform a sparse representation of the target video as a sparse combination of the reference video plus a sparse residue, while taking into account the transformation acting on it. Our method is then verified and compared against state-of-the-art techniques using a challenging video dataset, that comprises recordings presenting the described misalignments. Under the evaluation metrics used, the second proposed method exhibits an improvement of at least 16% over the first proposed one, and 22% over the next best rated method.Apresentamos dois métodos baseados em decomposições esparsas que podem detectar anomalias em sequências de vídeo obtidas por câmeras em movimento. O primeiro método estima a união de subespaços (UdS) que melhor representa todos os quadros de um vídeo de referência (livre de anomalias) como uma projeção de baixo-posto mais um resíduo esparso. Em seguida, é realizada uma representação de baixo-posto do vídeo alvo (possivelmente anômalo) aproveitando a UdS e o resíduo esparso calculado a partir do vídeo de referência. As anomalias são extraídas após o pós-processamento destas informações residuais. Esse algoritmo fornece bons resultados de detecção, além de eliminar a necessidade de uma sincronização prévia dos vídeos. No entanto, essa técnica perde eficiência quando os vídeos de referência e alvo apresentam desalinhamentos mais graves entre si. Isso pode ocorrer devido a pequenos movimentos descontrolados da câmera e tremores durante a fase de aquisição. Para estender sua aplicabilidade, uma segunda contribuição é proposta a fim de lidar com esse possível desalinhamento. Isso é feito modelando a discrepância de pose de câmera entre os vídeos de referência e alvo com transformações geométricas agindo no domínio dos quadros do vídeo alvo. Um algoritmo completo de decomposição de matrizes é apresentado para realizar uma representação esparsa do vídeo alvo como uma combinação esparsa do vídeo de referência, levando em consideração as transformações que atuam sobre seus quadros. Nosso método é então verificado e comparado com técnicas de última geração com auxílio de vídeos de uma base desafiadora, apresentando os desalinhamentos em questão. Sob as métricas de avaliação usadas, o segundo método proposto exibe uma melhoria de pelo menos 16% em relação ao primeiro, e 22% sobre o método melhor avaliado logo em seguida
Cascaded regression with sparsified feature covariance matrix for facial landmark detection
This paper explores the use of context on regression-based methods for facial landmarking. Regression based methods have revolutionised facial landmarking solutions. In particular those that implicitly infer the whole shape of a structured object have quickly become the state-of-the-art. The most notable exemplar is the Supervised Descent Method (SDM). Its main characteristics are the use of the cascaded regression approach, the use of the full appearance as the inference input, and the aforementioned aim to directly predict the full shape. In this article we argue that the key aspects responsible for the success of SDM are the use of cascaded regression and the avoidance of the constrained optimisation problem that characterised most of the previous approaches.We show that, surprisingly, it is possible to achieve comparable or superior performance using only landmark-specific predictors, which are linearly combined. We reason that augmenting the input with too much context (of which using the full appearance is the extreme case) can be harmful. In fact, we experimentally found that there is a relation between the data variance and the benefits of adding context to the input. We finally devise a simple greedy procedure that makes use of this fact to obtain superior performance to the SDM, while maintaining the simplicity of the algorithm. We show extensive results both for intermediate stages devised to prove the main aspects of the argumentative line, and to validate the overall performance of two models constructed based on these considerations
Single Frame Image super Resolution using Learned Directionlets
In this paper, a new directionally adaptive, learning based, single image
super resolution method using multiple direction wavelet transform, called
Directionlets is presented. This method uses directionlets to effectively
capture directional features and to extract edge information along different
directions of a set of available high resolution images .This information is
used as the training set for super resolving a low resolution input image and
the Directionlet coefficients at finer scales of its high-resolution image are
learned locally from this training set and the inverse Directionlet transform
recovers the super-resolved high resolution image. The simulation results
showed that the proposed approach outperforms standard interpolation techniques
like Cubic spline interpolation as well as standard Wavelet-based learning,
both visually and in terms of the mean squared error (mse) values. This method
gives good result with aliased images also.Comment: 14 pages,6 figure
Word contexts enhance the neural representation of individual letters in early visual cortex
Visual context facilitates perception, but how this is neurally implemented remains unclear. One example of contextual facilitation is found in reading, where letters are more easily identified when embedded in a word. Bottom-up models explain this word advantage as a post-perceptual decision bias, while top-down models propose that word contexts enhance perception itself. Here, we arbitrate between these accounts by presenting words and nonwords and probing the representational fidelity of individual letters using functional magnetic resonance imaging. In line with top-down models, we find that word contexts enhance letter representations in early visual cortex. Moreover, we observe increased coupling between letter information in visual cortex and brain activity in key areas of the reading network, suggesting these areas may be the source of the enhancement. Our results provide evidence for top-down representational enhancement in word recognition, demonstrating that word contexts can modulate perceptual processing already at the earliest visual regions
H3WB: Human3.6M 3D WholeBody Dataset and Benchmark
We present a benchmark for 3D human whole-body pose estimation, which
involves identifying accurate 3D keypoints on the entire human body, including
face, hands, body, and feet. Currently, the lack of a fully annotated and
accurate 3D whole-body dataset results in deep networks being trained
separately on specific body parts, which are combined during inference. Or they
rely on pseudo-groundtruth provided by parametric body models which are not as
accurate as detection based methods. To overcome these issues, we introduce the
Human3.6M 3D WholeBody (H3WB) dataset, which provides whole-body annotations
for the Human3.6M dataset using the COCO Wholebody layout. H3WB comprises 133
whole-body keypoint annotations on 100K images, made possible by our new
multi-view pipeline. We also propose three tasks: i) 3D whole-body pose lifting
from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D
incomplete whole-body pose, and iii) 3D whole-body pose estimation from a
single RGB image. Additionally, we report several baselines from popular
methods for these tasks. Furthermore, we also provide automated 3D whole-body
annotations of TotalCapture and experimentally show that when used with H3WB it
helps to improve the performance. Code and dataset is available at
https://github.com/wholebody3d/wholebody3dComment: Accepted by ICCV 202
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
- …