450,312 research outputs found

    Multi-Stream Management for Supporting Multi-Party 3D Tele-Immersive Environments

    Get PDF
    Three-dimensional tele-immersive (3DTI) environments have great potential to promote collaborative work among geographically distributed participants. However, extensive application of 3DTI environments is still hindered by problems pertaining to scalability, manageability and reliance of special-purpose components. Thus, one critical question is how to organize the acquisition, transmission and display of large volume real-time 3D visual data over commercially available computing and networking infrastructures so that .everybody. would be able to install and enjoy 3DTI environments for high quality tele-collaboration. In the thesis, we explore the design space from the angle of multi-stream Quality-of-Service (QoS) management to support multi-party 3DTI communication. In 3DTI environments, multiple correlated 3D video streams are deployed to provide a comprehensive representation of the physical scene. Traditional QoS approach in 2D and single-stream scenario has become inadequate. On the other hand, the existence of multiple streams provides unique opportunity for QoS provisioning. We propose an innovative cross-layer hierarchical and distributed multi-stream management middleware framework for QoS provisioning to fully enable multi-party 3DTI communication over general delivery infrastructure. The major contributions are as follows. First, we introduce the view model for representing the user interest in the application layer. The design revolves around the concept of view-aware multi-stream coordination, which leverages the central role of view semantics in 3D video systems. Second, in the stream differentiation layer we present the design of view to stream mapping, where a subset of relevant streams are selected based on the relative importance of each stream to the current view. Conventional streaming controllers focus on a fixed set of streams specified by the application. Different from all the others, in our management framework the application layer only specifies the view information while the underlying controller dynamically determines the set of streams to be managed. Third, in the stream coordination layer we present two designs applicable in different situations. In the case of end-to-end 3DTI communication, a learning-based controller is embedded which provides bandwidth allocation for relevant streams. In the case of multi-party 3DTI communication, we propose a novel ViewCast protocol to coordinate the multi-stream content dissemination upon an end-system overlay network

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    A review of multi-instance learning assumptions

    Get PDF
    Multi-instance (MI) learning is a variant of inductive machine learning, where each learning example contains a bag of instances instead of a single feature vector. The term commonly refers to the supervised setting, where each bag is associated with a label. This type of representation is a natural fit for a number of real-world learning scenarios, including drug activity prediction and image classification, hence many MI learning algorithms have been proposed. Any MI learning method must relate instances to bag-level class labels, but many types of relationships between instances and class labels are possible. Although all early work in MI learning assumes a specific MI concept class known to be appropriate for a drug activity prediction domain; this ‘standard MI assumption’ is not guaranteed to hold in other domains. Much of the recent work in MI learning has concentrated on a relaxed view of the MI problem, where the standard MI assumption is dropped, and alternative assumptions are considered instead. However, often it is not clearly stated what particular assumption is used and how it relates to other assumptions that have been proposed. In this paper, we aim to clarify the use of alternative MI assumptions by reviewing the work done in this area

    Strategies for Searching Video Content with Text Queries or Video Examples

    Full text link
    The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos, thus these videos are unsearchable by current search engines. Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity problem by directly analyzing the visual and audio streams of each video. CBVR encompasses multiple research topics, including low-level feature design, feature fusion, semantic detector training and video search/reranking. We present novel strategies in these topics to enhance CBVR in both accuracy and speed under different query inputs, including pure textual queries and query by video examples. Our proposed strategies have been incorporated into our submission for the TRECVID 2014 Multimedia Event Detection evaluation, where our system outperformed other submissions in both text queries and video example queries, thus demonstrating the effectiveness of our proposed approaches

    Common Representation Learning Using Step-based Correlation Multi-Modal CNN

    Full text link
    Deep learning techniques have been successfully used in learning a common representation for multi-view data, wherein the different modalities are projected onto a common subspace. In a broader perspective, the techniques used to investigate common representation learning falls under the categories of canonical correlation-based approaches and autoencoder based approaches. In this paper, we investigate the performance of deep autoencoder based methods on multi-view data. We propose a novel step-based correlation multi-modal CNN (CorrMCNN) which reconstructs one view of the data given the other while increasing the interaction between the representations at each hidden layer or every intermediate step. Finally, we evaluate the performance of the proposed model on two benchmark datasets - MNIST and XRMB. Through extensive experiments, we find that the proposed model achieves better performance than the current state-of-the-art techniques on joint common representation learning and transfer learning tasks.Comment: Accepted in Asian Conference of Pattern Recognition (ACPR-2017
    corecore