17,653 research outputs found

    Fusing MPEG-7 visual descriptors for image classification

    Get PDF
    This paper proposes three content-based image classification techniques based on fusing various low-level MPEG-7 visual descriptors. Fusion is necessary as descriptors would be otherwise incompatible and inappropriate to directly include e.g. in a Euclidean distance. Three approaches are described: A “merging” fusion combined with an SVM classifier, a back-propagation fusion combined with a KNN classifier and a Fuzzy-ART neurofuzzy network. In the latter case, fuzzy rules can be extracted in an effort to bridge the “semantic gap” between the low-level descriptors and the high-level semantics of an image. All networks were evaluated using content from the repository of the aceMedia project1 and more specifically in a beach/urban scene classification problem

    Metadata Augmentation for Semantic- and Context- Based Retrieval of Digital Cultural Objects

    Get PDF
    Cultural objects are increasingly stored and generated in digital form, yet effective methods for their indexing and retrieval still remain an open area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. The focus is on the rationale and conceptual design of the system and its various components. In particular, we discuss techniques for augmenting commonly used metadata with visual features and domain knowledge to generate high-level abstract metadata which in turn can be used for semantic and context-based indexing and retrieval. We use a sample collection of Vietnamese traditional woodcuts to demonstrate the usefulness of this approach

    Thick 2D Relations for Document Understanding

    Get PDF
    We use a propositional language of qualitative rectangle relations to detect the reading order from document images. To this end, we define the notion of a document encoding rule and we analyze possible formalisms to express document encoding rules such as LATEX and SGML. Document encoding rules expressed in the propositional language of rectangles are used to build a reading order detector for document images. In order to achieve robustness and avoid brittleness when applying the system to real life document images, the notion of a thick boundary interpretation for a qualitative relation is introduced. The framework is tested on a collection of heterogeneous document images showing recall rates up to 89%

    Perceptual-based textures for scene labeling: a bottom-up and a top-down approach

    Get PDF
    Due to the semantic gap, the automatic interpretation of digital images is a very challenging task. Both the segmentation and classification are intricate because of the high variation of the data. Therefore, the application of appropriate features is of utter importance. This paper presents biologically inspired texture features for material classification and interpreting outdoor scenery images. Experiments show that the presented texture features obtain the best classification results for material recognition compared to other well-known texture features, with an average classification rate of 93.0%. For scene analysis, both a bottom-up and top-down strategy are employed to bridge the semantic gap. At first, images are segmented into regions based on the perceptual texture and next, a semantic label is calculated for these regions. Since this emerging interpretation is still error prone, domain knowledge is ingested to achieve a more accurate description of the depicted scene. By applying both strategies, 91.9% of the pixels from outdoor scenery images obtained a correct label

    PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention

    Full text link
    Generating 3D point clouds is challenging yet highly desired. This work presents a novel autoregressive model, PointGrow, which can generate diverse and realistic point cloud samples from scratch or conditioned on semantic contexts. This model operates recurrently, with each point sampled according to a conditional distribution given its previously-generated points, allowing inter-point correlations to be well-exploited and 3D shape generative processes to be better interpreted. Since point cloud object shapes are typically encoded by long-range dependencies, we augment our model with dedicated self-attention modules to capture such relations. Extensive evaluations show that PointGrow achieves satisfying performance on both unconditional and conditional point cloud generation tasks, with respect to realism and diversity. Several important applications, such as unsupervised feature learning and shape arithmetic operations, are also demonstrated

    Relation Structure-Aware Heterogeneous Information Network Embedding

    Full text link
    Heterogeneous information network (HIN) embedding aims to embed multiple types of nodes into a low-dimensional space. Although most existing HIN embedding methods consider heterogeneous relations in HINs, they usually employ one single model for all relations without distinction, which inevitably restricts the capability of network embedding. In this paper, we take the structural characteristics of heterogeneous relations into consideration and propose a novel Relation structure-aware Heterogeneous Information Network Embedding model (RHINE). By exploring the real-world networks with thorough mathematical analysis, we present two structure-related measures which can consistently distinguish heterogeneous relations into two categories: Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the distinctive characteristics of relations, in our RHINE, we propose different models specifically tailored to handle ARs and IRs, which can better capture the structures and semantics of the networks. At last, we combine and optimize these models in a unified and elegant manner. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms the state-of-the-art methods in various tasks, including node clustering, link prediction, and node classification
    corecore