309 research outputs found

    Web Page Retrieval by Structure

    Get PDF
    Our research explores the possibility of categorizing webpages and webpage genre by structure or layout. Based on our results, we believe that webpage structure could play an important role, along with textual and visual keywords, in webpage categorization and searching

    Shingle 2.0: generalising self-consistent and automated domain discretisation for multi-scale geophysical models

    Full text link
    The approaches taken to describe and develop spatial discretisations of the domains required for geophysical simulation models are commonly ad hoc, model or application specific and under-documented. This is particularly acute for simulation models that are flexible in their use of multi-scale, anisotropic, fully unstructured meshes where a relatively large number of heterogeneous parameters are required to constrain their full description. As a consequence, it can be difficult to reproduce simulations, ensure a provenance in model data handling and initialisation, and a challenge to conduct model intercomparisons rigorously. This paper takes a novel approach to spatial discretisation, considering it much like a numerical simulation model problem of its own. It introduces a generalised, extensible, self-documenting approach to carefully describe, and necessarily fully, the constraints over the heterogeneous parameter space that determine how a domain is spatially discretised. This additionally provides a method to accurately record these constraints, using high-level natural language based abstractions, that enables full accounts of provenance, sharing and distribution. Together with this description, a generalised consistent approach to unstructured mesh generation for geophysical models is developed, that is automated, robust and repeatable, quick-to-draft, rigorously verified and consistent to the source data throughout. This interprets the description above to execute a self-consistent spatial discretisation process, which is automatically validated to expected discrete characteristics and metrics.Comment: 18 pages, 10 figures, 1 table. Submitted for publication and under revie

    Local Geometric Consensus: A General Purpose Point Pattern-Based Tracking Algorithm

    Get PDF
    Proceedings of ACM ISMAR 2015, Fukuoka, JapanInternational audienceWe present a method which can quickly and robustly match 2D and 3D point patterns based on their sole spatial distribution , but it can also handle other cues if available. This method can be easily adapted to many transformations such as similarity transformations in 2D/3D, and affine and perspective transformations in 2D. It is based on local geometric consensus among several local matchings and a refinement scheme. We provide two implementations of this general scheme, one for the 2D homography case (which can be used for marker or image tracking) and one for the 3D similarity case. We demonstrate the robustness and speed performance of our proposal on both synthetic and real images and show that our method can be used to augment any (textured/textureless) planar objects but also 3D objects

    Features and Algorithms for Visual Parsing of Handwritten Mathematical Expressions

    Get PDF
    Math expressions are an essential part of scientific documents. Handwritten math expressions recognition can benefit human-computer interaction especially in the education domain and is a critical part of document recognition and analysis. Parsing the spatial arrangement of symbols is an essential part of math expression recognition. A variety of parsing techniques have been developed during the past three decades, and fall into two groups. The first group is graph-based parsing. It selects a path or sub-graph which obeys some rule to form a possible interpretation for the given expression. The second group is grammar driven parsing. Grammars and related parameters are defined manually for different tasks. The time complexity of these two groups parsing is high, and they often impose some strict constraints to reduce the computation. The aim of this thesis is working towards building a straightforward and effective parser with as few constraints as possible. First, we propose using a line of sight graph for representing the layout of strokes and symbols in math expressions. It achieves higher F-score than other graph representations and reduces search space for parsing. Second, we modify the shape context feature with Parzen window density estimation. This feature set works well for symbol segmentation, symbol classification and symbol layout analysis. We get a higher symbol segmentation F-score than other systems on CROHME 2014 dataset. Finally, we develop a Maximum Spanning Tree (MST) based parser using Edmonds\u27 algorithm, which extracts an MST from the directed line of sight graph in two passes: first symbols are segmented, and then symbols and spatial relationship are labeled. The time complexity of our MST-based parsing is lower than the time complexity of CYK parsing with context-free grammars. Also, our MST-based parsing obtains higher structure rate and expression rate than CYK parsing when symbol segmentation is accurate. Correct structure means we get the structure of the symbol layout tree correct, even though the label of the edge in the symbol layout tree might be wrong. The performance of our math expression recognition system with MST-based parsing is competitive on CROHME 2012 and 2014 datasets. For future work, how to incorporate symbol classifier result and correct segmentation error in MST-based parsing needs more research

    Exploration visuelle et interactive d'une large collection d'images en libre accès

    Get PDF
    A la 15ème Conférence Internationale sur l'Extraction et la Gestion des Connaissances (EGC) - Atelier VIFCet article décrit l'approche et les résultats de travaux préliminaires sur l'exploration visuelle et interactive d'une large collection d'images en libre accès (Open Access images). Cette exploration s'appuie sur des descripteurs vi-suels et structure les images dans un graphe de proximité de manière incrémen-tale, afin de pouvoir traiter de grandes collections d'images. L'approche utilisée consiste à : (i) collecter des images et des métadonnées, (ii) extraire des des-cripteurs, (iii) structurer des données dans le graphe et (iv) visualiser le graphe obtenu à l'aide d'algorithmes existants de dessin de graphes

    Exploring the deep structure of images

    Get PDF

    Visual Analysis of High-Dimensional Point Clouds using Topological Abstraction

    Get PDF
    This thesis is about visualizing a kind of data that is trivial to process by computers but difficult to imagine by humans because nature does not allow for intuition with this type of information: high-dimensional data. Such data often result from representing observations of objects under various aspects or with different properties. In many applications, a typical, laborious task is to find related objects or to group those that are similar to each other. One classic solution for this task is to imagine the data as vectors in a Euclidean space with object variables as dimensions. Utilizing Euclidean distance as a measure of similarity, objects with similar properties and values accumulate to groups, so-called clusters, that are exposed by cluster analysis on the high-dimensional point cloud. Because similar vectors can be thought of as objects that are alike in terms of their attributes, the point cloud\''s structure and individual cluster properties, like their size or compactness, summarize data categories and their relative importance. The contribution of this thesis is a novel analysis approach for visual exploration of high-dimensional point clouds without suffering from structural occlusion. The work is based on implementing two key concepts: The first idea is to discard those geometric properties that cannot be preserved and, thus, lead to the typical artifacts. Topological concepts are used instead to shift away the focus from a point-centered view on the data to a more structure-centered perspective. The advantage is that topology-driven clustering information can be extracted in the data\''s original domain and be preserved without loss in low dimensions. The second idea is to split the analysis into a topology-based global overview and a subsequent geometric local refinement. The occlusion-free overview enables the analyst to identify features and to link them to other visualizations that permit analysis of those properties not captured by the topological abstraction, e.g. cluster shape or value distributions in particular dimensions or subspaces. The advantage of separating structure from data point analysis is that restricting local analysis only to data subsets significantly reduces artifacts and the visual complexity of standard techniques. That is, the additional topological layer enables the analyst to identify structure that was hidden before and to focus on particular features by suppressing irrelevant points during local feature analysis. This thesis addresses the topology-based visual analysis of high-dimensional point clouds for both the time-invariant and the time-varying case. Time-invariant means that the points do not change in their number or positions. That is, the analyst explores the clustering of a fixed and constant set of points. The extension to the time-varying case implies the analysis of a varying clustering, where clusters appear as new, merge or split, or vanish. Especially for high-dimensional data, both tracking---which means to relate features over time---but also visualizing changing structure are difficult problems to solve
    • …
    corecore