47,617 research outputs found
Task-Oriented Communication for Edge Video Analytics
With the development of artificial intelligence (AI) techniques and the
increasing popularity of camera-equipped devices, many edge video analytics
applications are emerging, calling for the deployment of computation-intensive
AI models at the network edge. Edge inference is a promising solution to move
the computation-intensive workloads from low-end devices to a powerful edge
server for video analytics, but the device-server communications will remain a
bottleneck due to the limited bandwidth. This paper proposes a task-oriented
communication framework for edge video analytics, where multiple devices
collect the visual sensory data and transmit the informative features to an
edge server for processing. To enable low-latency inference, this framework
removes video redundancy in spatial and temporal domains and transmits minimal
information that is essential for the downstream task, rather than
reconstructing the videos at the edge server. Specifically, it extracts compact
task-relevant features based on the deterministic information bottleneck (IB)
principle, which characterizes a tradeoff between the informativeness of the
features and the communication cost. As the features of consecutive frames are
temporally correlated, we propose a temporal entropy model (TEM) to reduce the
bitrate by taking the previous features as side information in feature
encoding. To further improve the inference performance, we build a
spatial-temporal fusion module at the server to integrate features of the
current and previous frames for joint inference. Extensive experiments on video
analytics tasks evidence that the proposed framework effectively encodes
task-relevant information of video data and achieves a better rate-performance
tradeoff than existing methods
Multisensory causal inference in the brain
At any given moment, our brain processes multiple inputs from its different sensory modalities (vision, hearing, touch, etc.). In deciphering this array of sensory information, the brain has to solve two problems: (1) which of the inputs originate from the same object and should be integrated and (2) for the sensations originating from the same object, how best to integrate them. Recent behavioural studies suggest that the human brain solves these problems using optimal probabilistic inference, known as Bayesian causal inference. However, how and where the underlying computations are carried out in the brain have remained unknown. By combining neuroimaging-based decoding techniques and computational modelling of behavioural data, a new study now sheds light on how multisensory causal inference maps onto specific brain areas. The results suggest that the complexity of neural computations increases along the visual hierarchy and link specific components of the causal inference process with specific visual and parietal regions
Language-Based Image Editing with Recurrent Attentive Models
We investigate the problem of Language-Based Image Editing (LBIE). Given a
source image and a natural language description, we want to generate a target
image by editing the source image based on the description. We propose a
generic modeling framework for two sub-tasks of LBIE: language-based image
segmentation and image colorization. The framework uses recurrent attentive
models to fuse image and language features. Instead of using a fixed step size,
we introduce for each region of the image a termination gate to dynamically
determine after each inference step whether to continue extrapolating
additional information from the textual description. The effectiveness of the
framework is validated on three datasets. First, we introduce a synthetic
dataset, called CoSaL, to evaluate the end-to-end performance of our LBIE
system. Second, we show that the framework leads to state-of-the-art
performance on image segmentation on the ReferIt dataset. Third, we present the
first language-based colorization result on the Oxford-102 Flowers dataset.Comment: Accepted to CVPR 2018 as a Spotligh
ICNet for Real-Time Semantic Segmentation on High-Resolution Images
We focus on the challenging task of real-time semantic segmentation in this
paper. It finds many practical applications and yet is with fundamental
difficulty of reducing a large portion of computation for pixel-wise label
inference. We propose an image cascade network (ICNet) that incorporates
multi-resolution branches under proper label guidance to address this
challenge. We provide in-depth analysis of our framework and introduce the
cascade feature fusion unit to quickly achieve high-quality segmentation. Our
system yields real-time inference on a single GPU card with decent quality
results evaluated on challenging datasets like Cityscapes, CamVid and
COCO-Stuff.Comment: ECCV 201
- …