48,559 research outputs found
Unobtrusive and pervasive video-based eye-gaze tracking
Eye-gaze tracking has long been considered a desktop technology that finds its use inside the traditional office setting, where the operating conditions may be controlled. Nonetheless, recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking eye movements within unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. This critical review focuses on emerging passive and unobtrusive video-based eye-gaze tracking methods in recent literature, with the aim to identify different research avenues that are being followed in response to the challenges of pervasive eye-gaze tracking. Different eye-gaze tracking approaches are discussed in order to bring out their strengths and weaknesses, and to identify any limitations, within the context of pervasive eye-gaze tracking, that have yet to be considered by the computer vision community.peer-reviewe
Attention Correctness in Neural Image Captioning
Attention mechanisms have recently been introduced in deep learning for
various tasks in natural language processing and computer vision. But despite
their popularity, the "correctness" of the implicitly-learned attention maps
has only been assessed qualitatively by visualization of several examples. In
this paper we focus on evaluating and improving the correctness of attention in
neural image captioning models. Specifically, we propose a quantitative
evaluation metric for the consistency between the generated attention maps and
human annotations, using recently released datasets with alignment between
regions in images and entities in captions. We then propose novel models with
different levels of explicit supervision for learning attention maps during
training. The supervision can be strong when alignment between regions and
caption entities are available, or weak when only object segments and
categories are provided. We show on the popular Flickr30k and COCO datasets
that introducing supervision of attention maps during training solidly improves
both attention correctness and caption quality, showing the promise of making
machine perception more human-like.Comment: To appear in AAAI-17. See http://www.cs.jhu.edu/~cxliu/ for
supplementary materia
Consciosusness in Cognitive Architectures. A Principled Analysis of RCS, Soar and ACT-R
This report analyses the aplicability of the principles of consciousness developed in the ASys project to three of the most relevant cognitive architectures. This is done in relation to their aplicability to build integrated control systems and studying their support for general mechanisms of real-time consciousness.\ud
To analyse these architectures the ASys Framework is employed. This is a conceptual framework based on an extension for cognitive autonomous systems of the General Systems Theory (GST).\ud
A general qualitative evaluation criteria for cognitive architectures is established based upon: a) requirements for a cognitive architecture, b) the theoretical framework based on the GST and c) core design principles for integrated cognitive conscious control systems
A Connectionist Theory of Phenomenal Experience
When cognitive scientists apply computational theory to the problem of phenomenal consciousness, as
many of them have been doing recently, there are two fundamentally distinct approaches available. Either
consciousness is to be explained in terms of the nature of the representational vehicles the brain deploys; or
it is to be explained in terms of the computational processes defined over these vehicles. We call versions of
these two approaches vehicle and process theories of consciousness, respectively. However, while there may
be space for vehicle theories of consciousness in cognitive science, they are relatively rare. This is because
of the influence exerted, on the one hand, by a large body of research which purports to show that the
explicit representation of information in the brain and conscious experience are dissociable, and on the
other, by the classical computational theory of mind – the theory that takes human cognition to be a species
of symbol manipulation. But two recent developments in cognitive science combine to suggest that a
reappraisal of this situation is in order. First, a number of theorists have recently been highly critical of the
experimental methodologies employed in the dissociation studies – so critical, in fact, it’s no longer
reasonable to assume that the dissociability of conscious experience and explicit representation has been
adequately demonstrated. Second, classicism, as a theory of human cognition, is no longer as dominant in
cognitive science as it once was. It now has a lively competitor in the form of connectionism; and
connectionism, unlike classicism, does have the computational resources to support a robust vehicle theory
of consciousness. In this paper we develop and defend this connectionist vehicle theory of consciousness. It
takes the form of the following simple empirical hypothesis: phenomenal experience consists in the explicit
representation of information in neurally realized PDP networks. This hypothesis leads us to re-assess some
common wisdom about consciousness, but, we will argue, in fruitful and ultimately plausible ways
Neural Face Editing with Intrinsic Image Disentangling
Traditional face editing methods often require a number of sophisticated and
task specific algorithms to be applied one after the other --- a process that
is tedious, fragile, and computationally intensive. In this paper, we propose
an end-to-end generative adversarial network that infers a face-specific
disentangled representation of intrinsic face properties, including shape (i.e.
normals), albedo, and lighting, and an alpha matte. We show that this network
can be trained on "in-the-wild" images by incorporating an in-network
physically-based image formation module and appropriate loss functions. Our
disentangling latent representation allows for semantically relevant edits,
where one aspect of facial appearance can be manipulated while keeping
orthogonal properties fixed, and we demonstrate its use for a number of facial
editing applications.Comment: CVPR 2017 ora
Emergence of Invariance and Disentanglement in Deep Representations
Using established principles from Statistics and Information Theory, we show
that invariance to nuisance factors in a deep neural network is equivalent to
information minimality of the learned representation, and that stacking layers
and injecting noise during training naturally bias the network towards learning
invariant representations. We then decompose the cross-entropy loss used during
training and highlight the presence of an inherent overfitting term. We propose
regularizing the loss by bounding such a term in two equivalent ways: One with
a Kullbach-Leibler term, which relates to a PAC-Bayes perspective; the other
using the information in the weights as a measure of complexity of a learned
model, yielding a novel Information Bottleneck for the weights. Finally, we
show that invariance and independence of the components of the representation
learned by the network are bounded above and below by the information in the
weights, and therefore are implicitly optimized during training. The theory
enables us to quantify and predict sharp phase transitions between underfitting
and overfitting of random labels when using our regularized loss, which we
verify in experiments, and sheds light on the relation between the geometry of
the loss function, invariance properties of the learned representation, and
generalization error.Comment: Deep learning, neural network, representation, flat minima,
information bottleneck, overfitting, generalization, sufficiency, minimality,
sensitivity, information complexity, stochastic gradient descent,
regularization, total correlation, PAC-Baye
- …