664 research outputs found
Large-scale interactive exploratory visual search
Large scale visual search has been one of the challenging issues in the era of big data. It demands techniques that are not only highly effective and efficient but also allow users conveniently express their information needs and refine their intents. In this thesis, we focus on developing an exploratory framework for large scale visual search. We also develop a number of enabling techniques in this thesis, including compact visual content representation for scalable search, near duplicate video shot detection, and action based event detection. We propose a novel scheme for extremely low bit rate visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. Compact representation of video data is achieved through identifying keyframes of a video which can also help users comprehend visual content efficiently. We propose a novel Bag-of-Importance model for static video summarization. Near duplicate detection is one of the key issues for large scale visual search, since there exist a large number nearly identical images and videos. We propose an improved near-duplicate video shot detection approach for more effective shot representation. Event detection has been one of the solutions for bridging the semantic gap in visual search. We particular focus on human action centred event detection. We propose an enhanced sparse coding scheme to model human actions. Our proposed approach is able to significantly reduce computational cost while achieving recognition accuracy highly comparable to the state-of-the-art methods. At last, we propose an integrated solution for addressing the prime challenges raised from large-scale interactive visual search. The proposed system is also one of the first attempts for exploratory visual search. It provides users more robust results to satisfy their exploring experiences
SatCom Today in Canada: Significant Research: Broadband Satellite Communications List of CITR related Publications (1998-2003)
Journal Papers
Conference Papers
Contributions to Standards
Canadian Space Agency Recent Publication
Robust Transmission of H.264/AVC Streams Using Adaptive Group Slicing and Unequal Error Protection
We present a novel scheme for the transmission of H.264/AVC video streams over lossy packet networks. The proposed scheme exploits the error-resilient features of H.264/AVC codec and employs Reed-Solomon codes to protect effectively the streams. A novel technique for adaptive classification of macroblocks into three slice groups is also proposed. The optimal classification of macroblocks and the optimal channel rate allocation are achieved by iterating two interdependent steps. Dynamic programming techniques are used for the channel rate allocation process in order to reduce complexity. Simulations clearly demonstrate the superiority of the proposed method over other recent algorithms for transmission of H.264/AVC streams
Spatial-Temporal Autoencoder with Attention Network for Video Compression
Deep learning-based approaches are now state of the art in
numerous tasks, including video compression, and are having a revolutionary
influence in video processing. Recently, learned video compression methods
exhibit a fast development trend with promising results. In this paper, taking
advantage of the powerful non-linear representation ability of neural networks,
we replace each standard component of video compression with a neural network.
We propose a spatial-temporal video compression network (STVC) using the
spatial-temporal priors with an attention module (STPA). On the one hand, joint
spatial-temporal priors are used for generating latent representations and
reconstructing compressed outputs because efficient temporal and spatial
information representation plays a crucial role in video coding. On the other hand,
we also added an efficient and effective Attention module such that the model
pays more effort on restoring the artifact-rich areas. Moreover, we formalize the
rate-distortion optimization into a single loss function, in which the network
learns to leverage the Spatial-temporal redundancy presented in the frames and
decreases the bit rate while maintaining visual quality in the decoded frames. The
experiment results show that our approach delivers the state-of-the-art learning
video compression performance in terms of MS-SSIM and PSNR
- …