84,972 research outputs found
Learning long-range spatial dependencies with horizontal gated-recurrent units
Progress in deep learning has spawned great successes in many engineering
applications. As a prime example, convolutional neural networks, a type of
feedforward neural networks, are now approaching -- and sometimes even
surpassing -- human accuracy on a variety of visual recognition tasks. Here,
however, we show that these neural networks and their recent extensions
struggle in recognition tasks where co-dependent visual features must be
detected over long spatial ranges. We introduce the horizontal gated-recurrent
unit (hGRU) to learn intrinsic horizontal connections -- both within and across
feature columns. We demonstrate that a single hGRU layer matches or outperforms
all tested feedforward hierarchical baselines including state-of-the-art
architectures which have orders of magnitude more free parameters. We further
discuss the biological plausibility of the hGRU in comparison to anatomical
data from the visual cortex as well as human behavioral data on a classic
contour detection task.Comment: Published at NeurIPS 2018
https://papers.nips.cc/paper/7300-learning-long-range-spatial-dependencies-with-horizontal-gated-recurrent-unit
Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions
Neural networks rely on convolutions to aggregate spatial information.
However, spatial convolutions are expensive in terms of model size and
computation, both of which grow quadratically with respect to kernel size. In
this paper, we present a parameter-free, FLOP-free "shift" operation as an
alternative to spatial convolutions. We fuse shifts and point-wise convolutions
to construct end-to-end trainable shift-based modules, with a hyperparameter
characterizing the tradeoff between accuracy and efficiency. To demonstrate the
operation's efficacy, we replace ResNet's 3x3 convolutions with shift-based
modules for improved CIFAR10 and CIFAR100 accuracy using 60% fewer parameters;
we additionally demonstrate the operation's resilience to parameter reduction
on ImageNet, outperforming ResNet family members. We finally show the shift
operation's applicability across domains, achieving strong performance with
fewer parameters on classification, face verification and style transfer.Comment: Source code will be released afterward
Geoscience after IT: Part J. Human requirements that shape the evolving geoscience information system
The geoscience record is constrained by the limitations of human thought and of the technology for handling information. IT can lead us away from the tyranny of older technology, but to find the right path, we need to understand our own limitations. Language, images, data and mathematical models, are tools for expressing and recording our ideas. Backed by intuition, they enable us to think in various modes, to build knowledge from information and create models as artificial views of a real world. Markup languages may accommodate more flexible and better connected records, and the object-oriented approach may help to match IT more closely to our thought processes
Analysis of Hand Segmentation in the Wild
A large number of works in egocentric vision have concentrated on action and
object recognition. Detection and segmentation of hands in first-person videos,
however, has less been explored. For many applications in this domain, it is
necessary to accurately segment not only hands of the camera wearer but also
the hands of others with whom he is interacting. Here, we take an in-depth look
at the hand segmentation problem. In the quest for robust hand segmentation
methods, we evaluated the performance of the state of the art semantic
segmentation methods, off the shelf and fine-tuned, on existing datasets. We
fine-tune RefineNet, a leading semantic segmentation method, for hand
segmentation and find that it does much better than the best contenders.
Existing hand segmentation datasets are collected in the laboratory settings.
To overcome this limitation, we contribute by collecting two new datasets: a)
EgoYouTubeHands including egocentric videos containing hands in the wild, and
b) HandOverFace to analyze the performance of our models in presence of similar
appearance occlusions. We further explore whether conditional random fields can
help refine generated hand segmentations. To demonstrate the benefit of
accurate hand maps, we train a CNN for hand-based activity recognition and
achieve higher accuracy when a CNN was trained using hand maps produced by the
fine-tuned RefineNet. Finally, we annotate a subset of the EgoHands dataset for
fine-grained action recognition and show that an accuracy of 58.6% can be
achieved by just looking at a single hand pose which is much better than the
chance level (12.5%).Comment: Accepted at CVPR 201
- …