4,455 research outputs found
Log-Euclidean Bag of Words for Human Action Recognition
Representing videos by densely extracted local space-time features has
recently become a popular approach for analysing actions. In this paper, we
tackle the problem of categorising human actions by devising Bag of Words (BoW)
models based on covariance matrices of spatio-temporal features, with the
features formed from histograms of optical flow. Since covariance matrices form
a special type of Riemannian manifold, the space of Symmetric Positive Definite
(SPD) matrices, non-Euclidean geometry should be taken into account while
discriminating between covariance matrices. To this end, we propose to embed
SPD manifolds to Euclidean spaces via a diffeomorphism and extend the BoW
approach to its Riemannian version. The proposed BoW approach takes into
account the manifold geometry of SPD matrices during the generation of the
codebook and histograms. Experiments on challenging human action datasets show
that the proposed method obtains notable improvements in discrimination
accuracy, in comparison to several state-of-the-art methods
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling
frozen LLMs to perform both understanding and generation tasks involving
non-linguistic modalities such as images or videos. SPAE converts between raw
pixels and interpretable lexical tokens (or words) extracted from the LLM's
vocabulary. The resulting tokens capture both the semantic meaning and the
fine-grained details needed for visual reconstruction, effectively translating
the visual content into a language comprehensible to the LLM, and empowering it
to perform a wide array of multimodal tasks. Our approach is validated through
in-context learning experiments with frozen PaLM 2 and GPT 3.5 on a diverse set
of image understanding and generation tasks. Our method marks the first
successful attempt to enable a frozen LLM to generate image content while
surpassing state-of-the-art performance in image understanding tasks, under the
same setting, by over 25%.Comment: NeurIPS 2023 spotligh
e-Counterfeit: a mobile-server platform for document counterfeit detection
This paper presents a novel application to detect counterfeit identity
documents forged by a scan-printing operation. Texture analysis approaches are
proposed to extract validation features from security background that is
usually printed in documents as IDs or banknotes. The main contribution of this
work is the end-to-end mobile-server architecture, which provides a service for
non-expert users and therefore can be used in several scenarios. The system
also provides a crowdsourcing mode so labeled images can be gathered,
generating databases for incremental training of the algorithms.Comment: 6 pages, 5 figure
A Method of Protein Model Classification and Retrieval Using Bag-of-Visual-Features
In this paper we propose a novel visual method for protein model classification and retrieval. Different from the conventional methods, the key idea of the proposed method is to extract image features of proteins and measure the visual similarity between proteins. Firstly, the multiview images are captured by vertices and planes of a given octahedron surrounding the protein. Secondly, the local features are extracted from each image of the different views by the SURF algorithm and are vector quantized into visual words using a visual codebook. Finally, KLD is employed to calculate the similarity distance between two feature vectors. Experimental results show that the proposed method has encouraging performances for protein retrieval and categorization as shown in the comparison with other methods
- …