469 research outputs found

    Low-latency compression of mocap data using learned spatial decorrelation transform

    Full text link
    Due to the growing needs of human motion capture (mocap) in movie, video games, sports, etc., it is highly desired to compress mocap data for efficient storage and transmission. This paper presents two efficient frameworks for compressing human mocap data with low latency. The first framework processes the data in a frame-by-frame manner so that it is ideal for mocap data streaming and time critical applications. The second one is clip-based and provides a flexible tradeoff between latency and compression performance. Since mocap data exhibits some unique spatial characteristics, we propose a very effective transform, namely learned orthogonal transform (LOT), for reducing the spatial redundancy. The LOT problem is formulated as minimizing square error regularized by orthogonality and sparsity and solved via alternating iteration. We also adopt a predictive coding and temporal DCT for temporal decorrelation in the frame- and clip-based frameworks, respectively. Experimental results show that the proposed frameworks can produce higher compression performance at lower computational cost and latency than the state-of-the-art methods.Comment: 15 pages, 9 figure

    Human Motion Capture Data Tailored Transform Coding

    Full text link
    Human motion capture (mocap) is a widely used technique for digitalizing human movements. With growing usage, compressing mocap data has received increasing attention, since compact data size enables efficient storage and transmission. Our analysis shows that mocap data have some unique characteristics that distinguish themselves from images and videos. Therefore, directly borrowing image or video compression techniques, such as discrete cosine transform, does not work well. In this paper, we propose a novel mocap-tailored transform coding algorithm that takes advantage of these features. Our algorithm segments the input mocap sequences into clips, which are represented in 2D matrices. Then it computes a set of data-dependent orthogonal bases to transform the matrices to frequency domain, in which the transform coefficients have significantly less dependency. Finally, the compression is obtained by entropy coding of the quantized coefficients and the bases. Our method has low computational cost and can be easily extended to compress mocap databases. It also requires neither training nor complicated parameter setting. Experimental results demonstrate that the proposed scheme significantly outperforms state-of-the-art algorithms in terms of compression performance and speed

    A human motion feature based on semi-supervised learning of GMM

    Get PDF
    Using motion capture to create naturally looking motion sequences for virtual character animation has become a standard procedure in the games and visual effects industry. With the fast growth of motion data, the task of automatically annotating new motions is gaining an importance. In this paper, we present a novel statistic feature to represent each motion according to the pre-labeled categories of key-poses. A probabilistic model is trained with semi-supervised learning of the Gaussian mixture model (GMM). Each pose in a given motion could then be described by a feature vector of a series of probabilities by GMM. A motion feature descriptor is proposed based on the statistics of all pose features. The experimental results and comparison with existing work show that our method performs more accurately and efficiently in motion retrieval and annotation

    Online MoCap Data Coding with Bit Allocation, Rate Control, and Motion-Adaptive Post-Processing

    Get PDF
    With the advancements in methods for capturing 3D object motion, motion capture (MoCap) data are starting to be used beyond their traditional realm of animation and gaming in areas such as the arts, rehabilitation, automotive industry, remote interactions, and so on. As the amount of MoCap data increases, compression becomes crucial for further expansion and adoption of these technologies. In this paper, we extend our previous work on low-delay MoCap data compression by introducing two improvements. The first improvement is the bit allocation to long-term and short-term reference MoCap frames, which provides a 10-15% reduction in coded bitrate at the same quality. The second improvement is the post-processing in the form of motion-adaptive temporal low-pass filtering, which is able to provide another 9-13%savings in the bitrate. The experimental results also indicate that the proposed online MoCap codec is competitive with several state-of-the-art offline codecs. Overall, the proposed techniques integrate into a highly effective online MoCap codec that is suitable for low-delay applications, whose implementation is provided alongside this paper to aid further research in the field

    Video Indexing and Retrieval Techniques Using Novel Approaches to Video Segmentation, Characterization, and Similarity Matching

    Get PDF
    Multimedia applications are rapidly spread at an ever-increasing rate introducing a number of challenging problems at the hands of the research community, The most significant and influential problem, among them, is the effective access to stored data. In spite of the popularity of keyword-based search technique in alphanumeric databases, it is inadequate for use with multimedia data due to their unstructured nature. On the other hand, a number of content-based access techniques have been developed in the context of image indexing and retrieval; meanwhile video retrieval systems start to gain wide attention, This work proposes a number of techniques constituting a fully content-based system for retrieving video data. These techniques are primarily targeting the efficiency, reliability, scalability, extensibility, and effectiveness requirements of such applications. First, an abstract representation of the video stream, known as the DC sequence, is extracted. Second, to deal with the problem of video segmentation, an efficient neural network model is introduced. The novel use of the neural network improves the reliability while the efficiency is achieved through the instantaneous use of the recall phase to identify shot boundaries. Third, the problem of key frames extraction is addressed using two efficient algorithms that adapt their selection decisions based on the amount of activity found in each video shot enabling the selection of a near optimal expressive set of key frames. Fourth, the developed system employs an indexing scheme that supports two low-level features, color and texture, to represent video data, Finally, we propose, in the retrieval stage, a novel model for performing video data matching task that integrates a number of human-based similarity factors. All our software implementations are in Java, which enables it to be used across heterogeneous platforms. The retrieval system performance has been evaluated yielding a very good retrieval rate and accuracy, which demonstrate the effectiveness of the developed system

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    Semi-synchronous video for deaf telephony with an adapted synchronous codec

    Get PDF
    Magister Scientiae - MScCommunication tools such as text-based instant messaging, voice and video relay services, real-time video chat and mobile SMS and MMS have successfully been used among Deaf people. Several years of field research with a local Deaf community revealed that disadvantaged South African Deaf people preferred to communicate with both Deaf and hearing peers in South African Sign Language as opposed to text. Synchronous video chat and video relay services provided such opportunities. Both types of services are commonly available in developed regions, but not in developing countries like South Africa. This thesis reports on a workaround approach to design and develop an asynchronous video communication tool that adapted synchronous video codecs to store-and-forward video delivery. This novel asynchronous video tool provided high quality South African Sign Language video chat at the expense of some additional latency. Synchronous video codec adaptation consisted of comparing codecs, and choosing one to optimise in order to minimise latency and preserve video quality. Traditional quality of service metrics only addressed real-time video quality and related services. There was no such standard for asynchronous video communication. Therefore, we also enhanced traditional objective video quality metrics with subjective assessment metrics conducted with the local Deaf community.South Afric

    Subseries Join and Compression of Time Series Data Based on Non-uniform Segmentation

    Get PDF
    A time series is composed of a sequence of data items that are measured at uniform intervals. Many application areas generate or manipulate time series, including finance, medicine, digital audio, and motion capture. Efficiently searching a large time series database is still a challenging problem, especially when partial or subseries matches are needed. This thesis proposes a new denition of subseries join, a symmetric generalization of subseries matching, which finds similar subseries in two or more time series datasets. A solution is proposed to compute the subseries join based on a hierarchical feature representation. This hierarchical feature representation is generated by an anisotropic diffusion scale-space analysis and a non-uniform segmentation method. Each segment is represented by a minimal polynomial envelope in a reduced-dimensionality space. Based on the hierarchical feature representation, all features in a dataset are indexed in an R-tree, and candidate matching features of two datasets are found by an R-tree join operation. Given candidate matching features, a dynamic programming algorithm is developed to compute the final subseries join. To improve storage efficiency, a hierarchical compression scheme is proposed to compress features. The minimal polynomial envelope representation is transformed to a Bezier spline envelope representation. The control points of each Bezier spline are then hierarchically differenced and an arithmetic coding is used to compress these differences. To empirically evaluate their effectiveness, the proposed subseries join and compression techniques are tested on various publicly available datasets. A large motion capture database is also used to verify the techniques in a real-world application. The experiments show that the proposed subseries join technique can better tolerate noise and local scaling than previous work, and the proposed compression technique can also achieve about 85% higher compression rates than previous work with the same distortion error
    • …
    corecore