47 research outputs found
Fast and Efficient Model for Real-Time Tiger Detection In The Wild
The highest accuracy object detectors to date are based either on a two-stage
approach such as Fast R-CNN or one-stage detectors such as Retina-Net or SSD
with deep and complex backbones. In this paper we present TigerNet - simple yet
efficient FPN based network architecture for Amur Tiger Detection in the wild.
The model has 600k parameters, requires 0.071 GFLOPs per image and can run on
the edge devices (smart cameras) in near real time. In addition, we introduce a
two-stage semi-supervised learning via pseudo-labelling learning approach to
distill the knowledge from the larger networks. For ATRW-ICCV 2019 tiger
detection sub-challenge, based on public leaderboard score, our approach shows
superior performance in comparison to other methods
Geometric Wavelet Scattering Networks on Compact Riemannian Manifolds
The Euclidean scattering transform was introduced nearly a decade ago to
improve the mathematical understanding of convolutional neural networks.
Inspired by recent interest in geometric deep learning, which aims to
generalize convolutional neural networks to manifold and graph-structured
domains, we define a geometric scattering transform on manifolds. Similar to
the Euclidean scattering transform, the geometric scattering transform is based
on a cascade of wavelet filters and pointwise nonlinearities. It is invariant
to local isometries and stable to certain types of diffeomorphisms. Empirical
results demonstrate its utility on several geometric learning tasks. Our
results generalize the deformation stability and local translation invariance
of Euclidean scattering, and demonstrate the importance of linking the used
filter structures to the underlying geometry of the data.Comment: 35 pages; 3 figures; 2 tables; v3: Revisions based on reviewer
comment
Co-training for Demographic Classification Using Deep Learning from Label Proportions
Deep learning algorithms have recently produced state-of-the-art accuracy in
many classification tasks, but this success is typically dependent on access to
many annotated training examples. For domains without such data, an attractive
alternative is to train models with light, or distant supervision. In this
paper, we introduce a deep neural network for the Learning from Label
Proportion (LLP) setting, in which the training data consist of bags of
unlabeled instances with associated label distributions for each bag. We
introduce a new regularization layer, Batch Averager, that can be appended to
the last layer of any deep neural network to convert it from supervised
learning to LLP. This layer can be implemented readily with existing deep
learning packages. To further support domains in which the data consist of two
conditionally independent feature views (e.g. image and text), we propose a
co-training algorithm that iteratively generates pseudo bags and refits the
deep LLP model to improve classification accuracy. We demonstrate our models on
demographic attribute classification (gender and race/ethnicity), which has
many applications in social media analysis, public health, and marketing. We
conduct experiments to predict demographics of Twitter users based on their
tweets and profile image, without requiring any user-level annotations for
training. We find that the deep LLP approach outperforms baselines for both
text and image features separately. Additionally, we find that co-training
algorithm improves image and text classification by 4% and 8% absolute F1,
respectively. Finally, an ensemble of text and image classifiers further
improves the absolute F1 measure by 4% on average
Visual odometry with depth-wise separable convolution and quaternion neural networks
Monocular visual odometry is a fundamental problem in computer vision and it was extensively studied in literature. The vast majority of visual odometry algorithms are based on a standard pipeline consisting in feature detection, feature matching, motion estimation and local optimization. Only recently, deep learning approaches have shown cutting-edge performance, replacing the standard pipeline with an end-to-end solution. One of the main advantages of deep learning approaches over the standard methods is the reduced inference time, that is an important requirement for the application of visual odometry in real-time. Less emphasis, however, has been placed on memory requirements and training efficiency. The memory footprint, in particular, is important for real world applications such as robot navigation or autonomous driving, where the devices have limited memory resources. In this paper we tackle both aspects introducing novel architectures based on Depth-Wise Separable Convolutional Neural Network and deep Quaternion Recurrent Convolutional Neural Network. In particular, we obtain equal or better accuracy with respect to the other state-of-the-art methods on the KITTI VO dataset with a reduction of the number of parameters and a speed-up in the inference time