1,684 research outputs found

    Supervised Hashing with End-to-End Binary Deep Neural Network

    Full text link
    Image hashing is a popular technique applied to large scale content-based visual retrieval due to its compact and efficient binary codes. Our work proposes a new end-to-end deep network architecture for supervised hashing which directly learns binary codes from input images and maintains good properties over binary codes such as similarity preservation, independence, and balancing. Furthermore, we also propose a new learning scheme that can cope with the binary constrained loss function. The proposed algorithm not only is scalable for learning over large-scale datasets but also outperforms state-of-the-art supervised hashing methods, which are illustrated throughout extensive experiments from various image retrieval benchmarks.Comment: Accepted to IEEE ICIP 201

    Relaxing the Forget Constraints in Open World Recognition

    Get PDF
    In the last few years deep neural networks has significantly improved the state-of-the-art of robotic vision. However, they are mainly trained to recognize only the categories provided in the training set (closed world assumption), being ill equipped to operate in the real world, where new unknown objects may appear over time. In this work, we investigate the open world recognition (OWR) problem that presents two challenges: (i) learn new concepts over time (incremental learning) and (ii) discern between known and unknown categories (open set recognition). Current state-of-the-art OWR methods address incremental learning by employing a knowledge distillation loss. It forces the model to keep the same predictions across training steps, in order to maintain the acquired knowledge. This behaviour may induce the model in mimicking uncertain predictions, preventing it from reaching an optimal representation on the new classes. To overcome this limitation, we propose the Poly loss that penalizes less the changes in the predictions for uncertain samples, while forcing the same output on confident ones. Moreover, we introduce a forget constraint relaxation strategy that allows the model to obtain a better representation of new classes by randomly zeroing the contribution of some old classes from the distillation loss. Finally, while current methods rely on metric learning to detect unknown samples, we propose a new rejection strategy that sidesteps it and directly uses the model classifier to estimate if a sample is known or not. Experiments on three datasets demonstrate that our method outperforms the state of the art

    Task-Oriented Communication for Edge Video Analytics

    Full text link
    With the development of artificial intelligence (AI) techniques and the increasing popularity of camera-equipped devices, many edge video analytics applications are emerging, calling for the deployment of computation-intensive AI models at the network edge. Edge inference is a promising solution to move the computation-intensive workloads from low-end devices to a powerful edge server for video analytics, but the device-server communications will remain a bottleneck due to the limited bandwidth. This paper proposes a task-oriented communication framework for edge video analytics, where multiple devices collect the visual sensory data and transmit the informative features to an edge server for processing. To enable low-latency inference, this framework removes video redundancy in spatial and temporal domains and transmits minimal information that is essential for the downstream task, rather than reconstructing the videos at the edge server. Specifically, it extracts compact task-relevant features based on the deterministic information bottleneck (IB) principle, which characterizes a tradeoff between the informativeness of the features and the communication cost. As the features of consecutive frames are temporally correlated, we propose a temporal entropy model (TEM) to reduce the bitrate by taking the previous features as side information in feature encoding. To further improve the inference performance, we build a spatial-temporal fusion module at the server to integrate features of the current and previous frames for joint inference. Extensive experiments on video analytics tasks evidence that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods
    • …
    corecore