17,653 research outputs found
Fusing MPEG-7 visual descriptors for image classification
This paper proposes three content-based image classification techniques based on fusing various low-level MPEG-7 visual descriptors. Fusion is necessary as descriptors would be otherwise incompatible and inappropriate to directly include e.g. in a Euclidean distance. Three approaches are described: A “merging” fusion combined with an SVM classifier, a back-propagation fusion combined with a KNN classifier and a Fuzzy-ART neurofuzzy network. In the latter case, fuzzy rules can be extracted in an effort to bridge the “semantic gap” between the low-level descriptors and the high-level semantics of an image. All networks were evaluated using content from the repository of the aceMedia project1 and more specifically in a beach/urban scene classification problem
Metadata Augmentation for Semantic- and Context- Based Retrieval of Digital Cultural Objects
Cultural objects are increasingly stored and generated in digital form, yet effective methods for their indexing and retrieval still remain an open area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. The focus is on the rationale and conceptual design of the system and its various components. In particular, we discuss techniques for augmenting commonly used metadata with visual features and domain knowledge to generate high-level abstract metadata which in turn can be used for semantic and context-based indexing and retrieval. We use a sample collection of Vietnamese traditional woodcuts to demonstrate the usefulness of this approach
Thick 2D Relations for Document Understanding
We use a propositional language of qualitative rectangle relations to detect the reading order from document images. To this end, we define the notion of a document encoding rule and we analyze possible formalisms to express document encoding rules such as LATEX and SGML. Document encoding rules expressed in the propositional language of rectangles are used to build a reading order detector for document images. In order to achieve robustness and avoid brittleness when applying the system to real life document images, the notion of a thick boundary interpretation for a qualitative relation is introduced. The framework is tested on a collection of heterogeneous document images showing recall rates up to 89%
Perceptual-based textures for scene labeling: a bottom-up and a top-down approach
Due to the semantic gap, the automatic interpretation of digital images is a very challenging task. Both the segmentation and classification are intricate because of the high variation of the data. Therefore, the application of appropriate features is of utter importance. This paper presents biologically inspired texture features for material classification and interpreting outdoor scenery images. Experiments show that the presented texture features obtain the best classification results for material recognition compared to other well-known texture features, with an average classification rate of 93.0%. For scene analysis, both a bottom-up and top-down strategy are employed to bridge the semantic gap. At first, images are segmented into regions based on the perceptual texture and next, a semantic label is calculated for these regions. Since this emerging interpretation is still error prone, domain knowledge is ingested to achieve a more accurate description of the depicted scene. By applying both strategies, 91.9% of the pixels from outdoor scenery images obtained a correct label
PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention
Generating 3D point clouds is challenging yet highly desired. This work
presents a novel autoregressive model, PointGrow, which can generate diverse
and realistic point cloud samples from scratch or conditioned on semantic
contexts. This model operates recurrently, with each point sampled according to
a conditional distribution given its previously-generated points, allowing
inter-point correlations to be well-exploited and 3D shape generative processes
to be better interpreted. Since point cloud object shapes are typically encoded
by long-range dependencies, we augment our model with dedicated self-attention
modules to capture such relations. Extensive evaluations show that PointGrow
achieves satisfying performance on both unconditional and conditional point
cloud generation tasks, with respect to realism and diversity. Several
important applications, such as unsupervised feature learning and shape
arithmetic operations, are also demonstrated
Relation Structure-Aware Heterogeneous Information Network Embedding
Heterogeneous information network (HIN) embedding aims to embed multiple
types of nodes into a low-dimensional space. Although most existing HIN
embedding methods consider heterogeneous relations in HINs, they usually employ
one single model for all relations without distinction, which inevitably
restricts the capability of network embedding. In this paper, we take the
structural characteristics of heterogeneous relations into consideration and
propose a novel Relation structure-aware Heterogeneous Information Network
Embedding model (RHINE). By exploring the real-world networks with thorough
mathematical analysis, we present two structure-related measures which can
consistently distinguish heterogeneous relations into two categories:
Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the
distinctive characteristics of relations, in our RHINE, we propose different
models specifically tailored to handle ARs and IRs, which can better capture
the structures and semantics of the networks. At last, we combine and optimize
these models in a unified and elegant manner. Extensive experiments on three
real-world datasets demonstrate that our model significantly outperforms the
state-of-the-art methods in various tasks, including node clustering, link
prediction, and node classification
- …