358 research outputs found

    Increasing Compression Ratio of Low Complexity Compressive Sensing Video Encoder with Application-Aware Configurable Mechanism

    Full text link
    With the development of embedded video acquisition nodes and wireless video surveillance systems, traditional video coding methods could not meet the needs of less computing complexity any more, as well as the urgent power consumption. So, a low-complexity compressive sensing video encoder framework with application-aware configurable mechanism is proposed in this paper, where novel encoding methods are exploited based on the practical purposes of the real applications to reduce the coding complexity effectively and improve the compression ratio (CR). Moreover, the group of processing (GOP) size and the measurement matrix size can be configured on the encoder side according to the post-analysis requirements of an application example of object tracking to increase the CR of encoder as best as possible. Simulations show the proposed framework of encoder could achieve 60X of CR when the tracking successful rate (SR) is still keeping above 90%.Comment: 5 pages with 6figures and 1 table,conferenc

    3D nonrigid medical image registration using a new information theoretic measure.

    No full text
    International audienceThis work presents a novel method for the nonrigid registration of medical images based on the Arimoto entropy, a generalization of the Shannon entropy. The proposed method employed the Jensen-Arimoto divergence measure as a similarity metric to measure the statistical dependence between medical images. Free-form deformations were adopted as the transformation model and the Parzen window estimation was applied to compute the probability distributions. A penalty term is incorporated into the objective function to smooth the nonrigid transformation. The goal of registration is to optimize an objective function consisting of a dissimilarity term and a penalty term, which would be minimal when two deformed images are perfectly aligned using the limited memory BFGS optimization method, and thus to get the optimal geometric transformation. To validate the performance of the proposed method, experiments on both simulated 3D brain MR images and real 3D thoracic CT data sets were designed and performed on the open source elastix package. For the simulated experiments, the registration errors of 3D brain MR images with various magnitudes of known deformations and different levels of noise were measured. For the real data tests, four data sets of 4D thoracic CT from four patients were selected to assess the registration performance of the method, including ten 3D CT images for each 4D CT data covering an entire respiration cycle. These results were compared with the normalized cross correlation and the mutual information methods and show a slight but true improvement in registration accuracy

    Construction of a complete set of orthogonal Fourier-Mellin moment invariants for pattern recognition applications

    No full text
    International audienceThe completeness property of a set of invariant descriptors is of fundamental importance from the theoretical as well as the practical points of view. In this paper, we propose a general approach to construct a complete set of orthogonal Fourier-Mellin moment (OFMM) invariants. By establishing a relationship between the OFMMs of the original image and those of the image having the same shape but distinct orientation and scale, a complete set of scale and rotation invariants is derived. The efficiency and the robustness to noise of the method for recognition tasks are shown by comparing it with some existing methods on several data sets

    SLNSpeech: solving extended speech separation problem by the help of sign language

    Full text link
    A speech separation task can be roughly divided into audio-only separation and audio-visual separation. In order to make speech separation technology applied in the real scenario of the disabled, this paper presents an extended speech separation problem which refers in particular to sign language assisted speech separation. However, most existing datasets for speech separation are audios and videos which contain audio and/or visual modalities. To address the extended speech separation problem, we introduce a large-scale dataset named Sign Language News Speech (SLNSpeech) dataset in which three modalities of audio, visual, and sign language are coexisted. Then, we design a general deep learning network for the self-supervised learning of three modalities, particularly, using sign language embeddings together with audio or audio-visual information for better solving the speech separation task. Specifically, we use 3D residual convolutional network to extract sign language features and use pretrained VGGNet model to exact visual features. After that, an improved U-Net with skip connections in feature extraction stage is applied for learning the embeddings among the mixed spectrogram transformed from source audios, the sign language features and visual features. Experiments results show that, besides visual modality, sign language modality can also be used alone to supervise speech separation task. Moreover, we also show the effectiveness of sign language assisted speech separation when the visual modality is disturbed. Source code will be released in http://cheertt.top/homepage/Comment: 33 pages, 8 figures, 5 table
    • …
    corecore