814 research outputs found

    Video streaming

    Get PDF

    From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

    Full text link
    Video captioning in essential is a complex natural process, which is affected by various uncertainties stemming from video content, subjective judgment, etc. In this paper we build on the recent progress in using encoder-decoder framework for video captioning and address what we find to be a critical deficiency of the existing methods, that most of the decoders propagate deterministic hidden states. Such complex uncertainty cannot be modeled efficiently by the deterministic models. In this paper, we propose a generative approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which models the uncertainty observed in the data using latent stochastic variables. Therefore, MS-RNN can improve the performance of video captioning, and generate multiple sentences to describe a video considering different random factors. Specifically, a multi-modal LSTM (M-LSTM) is first proposed to interact with both visual and textual features to capture a high-level representation. Then, a backward stochastic LSTM (S-LSTM) is proposed to support uncertainty propagation by introducing latent variables. Experimental results on the challenging datasets MSVD and MSR-VTT show that our proposed MS-RNN approach outperforms the state-of-the-art video captioning benchmarks

    Spatio-temporal rich model-based video steganalysis on cross sections of motion vector planes.

    Get PDF
    A rich model-based motion vector (MV) steganalysis benefiting from both temporal and spatial correlations of MVs is proposed in this paper. The proposed steganalysis method has a substantially superior detection accuracy than the previous methods, even the targeted ones. The improvement in detection accuracy lies in several novel approaches introduced in this paper. First, it is shown that there is a strong correlation, not only spatially but also temporally, among neighbouring MVs for longer distances. Therefore, temporal MV dependency alongside the spatial dependency is utilized for rigorous MV steganalysis. Second, unlike the filters previously used, which were heuristically designed against a specific MV steganography, a diverse set of many filters, which can capture aberrations introduced by various MV steganography methods is used. The variety and also the number of the filter kernels are substantially more than that of used in the previous ones. Besides that, filters up to fifth order are employed whereas the previous methods use at most second order filters. As a result of these, the proposed system captures various decorrelations in a wide spatio-temporal range and provides a better cover model. The proposed method is tested against the most prominent MV steganalysis and steganography methods. To the best knowledge of the authors, the experiments section has the most comprehensive tests in MV steganalysis field, including five stego and seven steganalysis methods. Test results show that the proposed method yields around 20% detection accuracy increase in low payloads and 5% in higher payloads.Engineering and Physical Sciences Research Council through the CSIT 2 Project under Grant EP/N508664/1

    3D Wavelet-Based Video Codec with Human Perceptual Model

    Get PDF
    This thesis explores the use of a human perceptual model in video compression, channel coding, error concealment and subjective image quality measurement. The perceptual distortion model just-noticeable-distortion (JND) is investigated. A video encoding/decoding scheme based on 3D wavelet decomposition and the human perceptual model is implemented. It provides a prior compression quality control which is distinct from the conventional video coding system. JND is applied in quantizer design to improve the subjective quality ofcompressed video. The 3D wavelet decomposition helps to remove spatial and temporal redundancy and provides scalability of video quality. In order to conceal the errors that may occur under bad wireless channel conditions, a slicing method and a joint source channel coding scenario that combines RCPC with CRC and uses the distortion information toallocate convolutional coding rates are proposed. A new subjective quality index based on JND is proposed and used to evaluate the overall performance at different signal to noise ratios (SNR) and at different compression ratios.Due to the wide use of arithmetic coding (AC) in data compression, we consider it as a readily available unit in the video codec system for broadcasting. A new scheme for conditional access (CA) sub-system is designed based on the cryptographic property of arithmetic coding. Itsperformance is analyzed along with its application in a multi-resolution video compression system. This scheme simplifies the conditional access sub-system and provides satisfactory system reliability

    Scale-Space Splatting: Reforming Spacetime for the Cross-Scale Exploration of Integral Measures in Molecular Dynamics

    Full text link
    Understanding large amounts of spatiotemporal data from particle-based simulations, such as molecular dynamics, often relies on the computation and analysis of aggregate measures. These, however, by virtue of aggregation, hide structural information about the space/time localization of the studied phenomena. This leads to degenerate cases where the measures fail to capture distinct behaviour. In order to drill into these aggregate values, we propose a multi-scale visual exploration technique. Our novel representation, based on partial domain aggregation, enables the construction of a continuous scale-space for discrete datasets and the simultaneous exploration of scales in both space and time. We link these two scale-spaces in a scale-space space-time cube and model linked views as orthogonal slices through this cube, thus enabling the rapid identification of spatio-temporal patterns at multiple scales. To demonstrate the effectiveness of our approach, we showcase an advanced exploration of a protein-ligand simulation.Comment: 11 pages, 9 figures, IEEE SciVis 201

    Automatic visual detection of human behavior: a review from 2000 to 2014

    Get PDF
    Due to advances in information technology (e.g., digital video cameras, ubiquitous sensors), the automatic detection of human behaviors from video is a very recent research topic. In this paper, we perform a systematic and recent literature review on this topic, from 2000 to 2014, covering a selection of 193 papers that were searched from six major scientific publishers. The selected papers were classified into three main subjects: detection techniques, datasets and applications. The detection techniques were divided into four categories (initialization, tracking, pose estimation and recognition). The list of datasets includes eight examples (e.g., Hollywood action). Finally, several application areas were identified, including human detection, abnormal activity detection, action recognition, player modeling and pedestrian detection. Our analysis provides a road map to guide future research for designing automatic visual human behavior detection systems.This work is funded by the Portuguese Foundation for Science and Technology (FCT - Fundacao para a Ciencia e a Tecnologia) under research Grant SFRH/BD/84939/2012

    Machine Learning-based Orchestration Solutions for Future Slicing-Enabled Mobile Networks

    Get PDF
    The fifth generation mobile networks (5G) will incorporate novel technologies such as network programmability and virtualization enabled by Software-Defined Networking (SDN) and Network Function Virtualization (NFV) paradigms, which have recently attracted major interest from both academic and industrial stakeholders. Building on these concepts, Network Slicing raised as the main driver of a novel business model where mobile operators may open, i.e., “slice”, their infrastructure to new business players and offer independent, isolated and self-contained sets of network functions and physical/virtual resources tailored to specific services requirements. While Network Slicing has the potential to increase the revenue sources of service providers, it involves a number of technical challenges that must be carefully addressed. End-to-end (E2E) network slices encompass time and spectrum resources in the radio access network (RAN), transport resources on the fronthauling/backhauling links, and computing and storage resources at core and edge data centers. Additionally, the vertical service requirements’ heterogeneity (e.g., high throughput, low latency, high reliability) exacerbates the need for novel orchestration solutions able to manage end-to-end network slice resources across different domains, while satisfying stringent service level agreements and specific traffic requirements. An end-to-end network slicing orchestration solution shall i) admit network slice requests such that the overall system revenues are maximized, ii) provide the required resources across different network domains to fulfill the Service Level Agreements (SLAs) iii) dynamically adapt the resource allocation based on the real-time traffic load, endusers’ mobility and instantaneous wireless channel statistics. Certainly, a mobile network represents a fast-changing scenario characterized by complex spatio-temporal relationship connecting end-users’ traffic demand with social activities and economy. Legacy models that aim at providing dynamic resource allocation based on traditional traffic demand forecasting techniques fail to capture these important aspects. To close this gap, machine learning-aided solutions are quickly arising as promising technologies to sustain, in a scalable manner, the set of operations required by the network slicing context. How to implement such resource allocation schemes among slices, while trying to make the most efficient use of the networking resources composing the mobile infrastructure, are key problems underlying the network slicing paradigm, which will be addressed in this thesis

    Detection of Sign Language in Picture-in-Picture Video

    Get PDF
    The internet enables almost anyone to locate content on almost any topic. This ability, however, is not easily available for those who sign. In order to provide resources to those whose primary language is sign language, a digital library, called SLaDL, has been created. In order to ensure maximum efficiency of the video-processor that detects sign language, it is important to check that the program works on all video resolutions. Picture-in-picture videos pose a challenge, as they contain fewer pixels and possess different characteristics than standard webcam sign language videos. However, these videos are very important to test as they are less likely to be retrieved otherwise through tags or other metadata. This project aims to detect and identify sign language in picture-in-picture videos through polar motion profiles, working to expand the corpus of videos on which the processor is successful
    • …
    corecore