1,684 research outputs found
Supervised Hashing with End-to-End Binary Deep Neural Network
Image hashing is a popular technique applied to large scale content-based
visual retrieval due to its compact and efficient binary codes. Our work
proposes a new end-to-end deep network architecture for supervised hashing
which directly learns binary codes from input images and maintains good
properties over binary codes such as similarity preservation, independence, and
balancing. Furthermore, we also propose a new learning scheme that can cope
with the binary constrained loss function. The proposed algorithm not only is
scalable for learning over large-scale datasets but also outperforms
state-of-the-art supervised hashing methods, which are illustrated throughout
extensive experiments from various image retrieval benchmarks.Comment: Accepted to IEEE ICIP 201
Relaxing the Forget Constraints in Open World Recognition
In the last few years deep neural networks has significantly improved the state-of-the-art of robotic vision. However, they are mainly trained to recognize only the categories provided in the training set (closed world assumption), being ill equipped to operate in the real world, where new unknown objects may appear over time. In this work, we investigate the open world recognition (OWR) problem that presents two challenges: (i) learn new concepts over time (incremental learning) and (ii) discern between known and unknown categories (open set recognition). Current state-of-the-art OWR methods address incremental learning by employing a knowledge distillation loss. It forces the model to keep the same predictions across training steps, in order to maintain the acquired knowledge. This behaviour may induce the model in mimicking uncertain predictions, preventing it from reaching an optimal representation on the new classes. To overcome this limitation, we propose the Poly loss that penalizes less the changes in the predictions for uncertain samples, while forcing the same output on confident ones. Moreover, we introduce a forget constraint relaxation strategy that allows the model to obtain a better representation of new classes by randomly zeroing the contribution of some old classes from the distillation loss. Finally, while current methods rely on metric learning to detect unknown samples, we propose a new rejection strategy that sidesteps it and directly uses the model classifier to estimate if a sample is known or not. Experiments on three datasets demonstrate that our method outperforms the state of the art
Task-Oriented Communication for Edge Video Analytics
With the development of artificial intelligence (AI) techniques and the
increasing popularity of camera-equipped devices, many edge video analytics
applications are emerging, calling for the deployment of computation-intensive
AI models at the network edge. Edge inference is a promising solution to move
the computation-intensive workloads from low-end devices to a powerful edge
server for video analytics, but the device-server communications will remain a
bottleneck due to the limited bandwidth. This paper proposes a task-oriented
communication framework for edge video analytics, where multiple devices
collect the visual sensory data and transmit the informative features to an
edge server for processing. To enable low-latency inference, this framework
removes video redundancy in spatial and temporal domains and transmits minimal
information that is essential for the downstream task, rather than
reconstructing the videos at the edge server. Specifically, it extracts compact
task-relevant features based on the deterministic information bottleneck (IB)
principle, which characterizes a tradeoff between the informativeness of the
features and the communication cost. As the features of consecutive frames are
temporally correlated, we propose a temporal entropy model (TEM) to reduce the
bitrate by taking the previous features as side information in feature
encoding. To further improve the inference performance, we build a
spatial-temporal fusion module at the server to integrate features of the
current and previous frames for joint inference. Extensive experiments on video
analytics tasks evidence that the proposed framework effectively encodes
task-relevant information of video data and achieves a better rate-performance
tradeoff than existing methods
- …