358 research outputs found
Increasing Compression Ratio of Low Complexity Compressive Sensing Video Encoder with Application-Aware Configurable Mechanism
With the development of embedded video acquisition nodes and wireless video
surveillance systems, traditional video coding methods could not meet the needs
of less computing complexity any more, as well as the urgent power consumption.
So, a low-complexity compressive sensing video encoder framework with
application-aware configurable mechanism is proposed in this paper, where novel
encoding methods are exploited based on the practical purposes of the real
applications to reduce the coding complexity effectively and improve the
compression ratio (CR). Moreover, the group of processing (GOP) size and the
measurement matrix size can be configured on the encoder side according to the
post-analysis requirements of an application example of object tracking to
increase the CR of encoder as best as possible. Simulations show the proposed
framework of encoder could achieve 60X of CR when the tracking successful rate
(SR) is still keeping above 90%.Comment: 5 pages with 6figures and 1 table,conferenc
3D nonrigid medical image registration using a new information theoretic measure.
International audienceThis work presents a novel method for the nonrigid registration of medical images based on the Arimoto entropy, a generalization of the Shannon entropy. The proposed method employed the Jensen-Arimoto divergence measure as a similarity metric to measure the statistical dependence between medical images. Free-form deformations were adopted as the transformation model and the Parzen window estimation was applied to compute the probability distributions. A penalty term is incorporated into the objective function to smooth the nonrigid transformation. The goal of registration is to optimize an objective function consisting of a dissimilarity term and a penalty term, which would be minimal when two deformed images are perfectly aligned using the limited memory BFGS optimization method, and thus to get the optimal geometric transformation. To validate the performance of the proposed method, experiments on both simulated 3D brain MR images and real 3D thoracic CT data sets were designed and performed on the open source elastix package. For the simulated experiments, the registration errors of 3D brain MR images with various magnitudes of known deformations and different levels of noise were measured. For the real data tests, four data sets of 4D thoracic CT from four patients were selected to assess the registration performance of the method, including ten 3D CT images for each 4D CT data covering an entire respiration cycle. These results were compared with the normalized cross correlation and the mutual information methods and show a slight but true improvement in registration accuracy
Construction of a complete set of orthogonal Fourier-Mellin moment invariants for pattern recognition applications
International audienceThe completeness property of a set of invariant descriptors is of fundamental importance from the theoretical as well as the practical points of view. In this paper, we propose a general approach to construct a complete set of orthogonal Fourier-Mellin moment (OFMM) invariants. By establishing a relationship between the OFMMs of the original image and those of the image having the same shape but distinct orientation and scale, a complete set of scale and rotation invariants is derived. The efficiency and the robustness to noise of the method for recognition tasks are shown by comparing it with some existing methods on several data sets
SLNSpeech: solving extended speech separation problem by the help of sign language
A speech separation task can be roughly divided into audio-only separation
and audio-visual separation. In order to make speech separation technology
applied in the real scenario of the disabled, this paper presents an extended
speech separation problem which refers in particular to sign language assisted
speech separation. However, most existing datasets for speech separation are
audios and videos which contain audio and/or visual modalities. To address the
extended speech separation problem, we introduce a large-scale dataset named
Sign Language News Speech (SLNSpeech) dataset in which three modalities of
audio, visual, and sign language are coexisted. Then, we design a general deep
learning network for the self-supervised learning of three modalities,
particularly, using sign language embeddings together with audio or
audio-visual information for better solving the speech separation task.
Specifically, we use 3D residual convolutional network to extract sign language
features and use pretrained VGGNet model to exact visual features. After that,
an improved U-Net with skip connections in feature extraction stage is applied
for learning the embeddings among the mixed spectrogram transformed from source
audios, the sign language features and visual features. Experiments results
show that, besides visual modality, sign language modality can also be used
alone to supervise speech separation task. Moreover, we also show the
effectiveness of sign language assisted speech separation when the visual
modality is disturbed. Source code will be released in
http://cheertt.top/homepage/Comment: 33 pages, 8 figures, 5 table
- …