1,938 research outputs found
Straight to Shapes: Real-time Detection of Encoded Shapes
Current object detection approaches predict bounding boxes, but these provide
little instance-specific information beyond location, scale and aspect ratio.
In this work, we propose to directly regress to objects' shapes in addition to
their bounding boxes and categories. It is crucial to find an appropriate shape
representation that is compact and decodable, and in which objects can be
compared for higher-order concepts such as view similarity, pose variation and
occlusion. To achieve this, we use a denoising convolutional auto-encoder to
establish an embedding space, and place the decoder after a fast end-to-end
network trained to regress directly to the encoded shape vectors. This yields
what to the best of our knowledge is the first real-time shape prediction
network, running at ~35 FPS on a high-end desktop. With higher-order shape
reasoning well-integrated into the network pipeline, the network shows the
useful practical quality of generalising to unseen categories similar to the
ones in the training set, something that most existing approaches fail to
handle.Comment: 16 pages including appendix; Published at CVPR 201
Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos
In this work, we propose an approach to the spatiotemporal localisation
(detection) and classification of multiple concurrent actions within temporally
untrimmed videos. Our framework is composed of three stages. In stage 1,
appearance and motion detection networks are employed to localise and score
actions from colour images and optical flow. In stage 2, the appearance network
detections are boosted by combining them with the motion detection scores, in
proportion to their respective spatial overlap. In stage 3, sequences of
detection boxes most likely to be associated with a single action instance,
called action tubes, are constructed by solving two energy maximisation
problems via dynamic programming. While in the first pass, action paths
spanning the whole video are built by linking detection boxes over time using
their class-specific scores and their spatial overlap, in the second pass,
temporal trimming is performed by ensuring label consistency for all
constituting detection boxes. We demonstrate the performance of our algorithm
on the challenging UCF101, J-HMDB-21 and LIRIS-HARL datasets, achieving new
state-of-the-art results across the board and significantly increasing
detection speed at test time. We achieve a huge leap forward in action
detection performance and report a 20% and 11% gain in mAP (mean average
precision) on UCF-101 and J-HMDB-21 datasets respectively when compared to the
state-of-the-art.Comment: Accepted by British Machine Vision Conference 201
InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure
Volumetric models have become a popular representation for 3D scenes in
recent years. One breakthrough leading to their popularity was KinectFusion,
which focuses on 3D reconstruction using RGB-D sensors. However, monocular SLAM
has since also been tackled with very similar approaches. Representing the
reconstruction volumetrically as a TSDF leads to most of the simplicity and
efficiency that can be achieved with GPU implementations of these systems.
However, this representation is memory-intensive and limits applicability to
small-scale reconstructions. Several avenues have been explored to overcome
this. With the aim of summarizing them and providing for a fast, flexible 3D
reconstruction pipeline, we propose a new, unifying framework called InfiniTAM.
The idea is that steps like camera tracking, scene representation and
integration of new data can easily be replaced and adapted to the user's needs.
This report describes the technical implementation details of InfiniTAM v3,
the third version of our InfiniTAM system. We have added various new features,
as well as making numerous enhancements to the low-level code that
significantly improve our camera tracking performance. The new features that we
expect to be of most interest are (i) a robust camera tracking module; (ii) an
implementation of Glocker et al.'s keyframe-based random ferns camera
relocaliser; (iii) a novel approach to globally-consistent TSDF-based
reconstruction, based on dividing the scene into rigid submaps and optimising
the relative poses between them; and (iv) an implementation of Keller et al.'s
surfel-based reconstruction approach.Comment: This article largely supersedes arxiv:1410.0925 (it describes version
3 of the InfiniTAM framework
Venture capitalists in Asia: a comparison with the U.S. and Europe.
This research utilizes an institutional perspective to examine the behavior of venture capital professionals in three distinct regions of the world (Asia, U.S., Europe). Based upon a mail survey, we find reasonably consistent views around the world on the relative importance of various venture capitalist roles. However, we find that how those roles are implemented is shaped by cognitive institutional influences in the given region. We find that a model developed in the U.S. to predict the amount of venture capitalist/CEO interaction is not valid in Asia. Further, Asian boards have much greater insider representation than do U.S. or European boards. We attribute these difference to the greater emphasis in Asia on the importance of collective action
TraMNet - Transition Matrix Network for Efficient Action Tube Proposals
Current state-of-the-art methods solve spatiotemporal action localisation by
extending 2D anchors to 3D-cuboid proposals on stacks of frames, to generate
sets of temporally connected bounding boxes called \textit{action micro-tubes}.
However, they fail to consider that the underlying anchor proposal hypotheses
should also move (transition) from frame to frame, as the actor or the camera
does. Assuming we evaluate 2D anchors in each frame, then the number of
possible transitions from each 2D anchor to the next, for a sequence of
consecutive frames, is in the order of , expensive even for small
values of . To avoid this problem, we introduce a Transition-Matrix-based
Network (TraMNet) which relies on computing transition probabilities between
anchor proposals while maximising their overlap with ground truth bounding
boxes across frames, and enforcing sparsity via a transition threshold. As the
resulting transition matrix is sparse and stochastic, this reduces the proposal
hypothesis search space from to the cardinality of the thresholded
matrix. At training time, transitions are specific to cell locations of the
feature maps, so that a sparse (efficient) transition matrix is used to train
the network. At test time, a denser transition matrix can be obtained either by
decreasing the threshold or by adding to it all the relative transitions
originating from any cell location, allowing the network to handle transitions
in the test data that might not have been present in the training data, and
making detection translation-invariant. Finally, we show that our network can
handle sparse annotations such as those available in the DALY dataset. We
report extensive experiments on the DALY, UCF101-24 and Transformed-UCF101-24
datasets to support our claims.Comment: 15 page
Cavity Quantum Electrodynamics with Anderson-localized Modes
A major challenge in quantum optics and quantum information technology is to
enhance the interaction between single photons and single quantum emitters.
Highly engineered optical cavities are generally implemented requiring
nanoscale fabrication precision. We demonstrate a fundamentally different
approach in which disorder is used as a resource rather than a nuisance. We
generate strongly confined Anderson-localized cavity modes by deliberately
adding disorder to photonic crystal waveguides. The emission rate of a
semiconductor quantum dot embedded in the waveguide is enhanced by a factor of
15 on resonance with the Anderson-localized mode and 94 % of the emitted
single-photons couple to the mode. Disordered photonic media thus provide an
efficient platform for quantum electrodynamics offering an approach to
inherently disorder-robust quantum information devices
- …