87,903 research outputs found
Two Approaches for Text Segmentation in Web Images
There is a significant need to recognise the text in images on web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents and compares two novel methods for the segmentation of characters for subsequent extraction and recognition. The novelty of both approaches is the combination of (different in each case) topological features of characters with an anthropocentric perspective of colour perceptionâ in preference to RGB space analysis. Both approaches enable the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background)
Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering
Hierarchical clustering is a popular method for analyzing data which
associates a tree to a dataset. Hartigan consistency has been used extensively
as a framework to analyze such clustering algorithms from a statistical point
of view. Still, as we show in the paper, a tree which is Hartigan consistent
with a given density can look very different than the correct limit tree.
Specifically, Hartigan consistency permits two types of undesirable
configurations which we term over-segmentation and improper nesting. Moreover,
Hartigan consistency is a limit property and does not directly quantify
difference between trees.
In this paper we identify two limit properties, separation and minimality,
which address both over-segmentation and improper nesting and together imply
(but are not implied by) Hartigan consistency. We proceed to introduce a merge
distortion metric between hierarchical clusterings and show that convergence in
our distance implies both separation and minimality. We also prove that uniform
separation and minimality imply convergence in the merge distortion metric.
Furthermore, we show that our merge distortion metric is stable under
perturbations of the density.
Finally, we demonstrate applicability of these concepts by proving
convergence results for two clustering algorithms. First, we show convergence
(and hence separation and minimality) of the recent robust single linkage
algorithm of Chaudhuri and Dasgupta (2010). Second, we provide convergence
results on manifolds for topological split tree clustering
Two Approaches for Text Segmentation in Web Images
There is a significant need to recognise the text in images on web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents and compares two novel methods for the segmentation of characters for subsequent extraction and recognition. The novelty of both approaches is the combination of (different in each case) topological features of characters with an anthropocentric perspective of colour perceptionâ in preference to RGB space analysis. Both approaches enable the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background)
Task-based Augmented Contour Trees with Fibonacci Heaps
This paper presents a new algorithm for the fast, shared memory, multi-core
computation of augmented contour trees on triangulations. In contrast to most
existing parallel algorithms our technique computes augmented trees, enabling
the full extent of contour tree based applications including data segmentation.
Our approach completely revisits the traditional, sequential contour tree
algorithm to re-formulate all the steps of the computation as a set of
independent local tasks. This includes a new computation procedure based on
Fibonacci heaps for the join and split trees, two intermediate data structures
used to compute the contour tree, whose constructions are efficiently carried
out concurrently thanks to the dynamic scheduling of task parallelism. We also
introduce a new parallel algorithm for the combination of these two trees into
the output global contour tree. Overall, this results in superior time
performance in practice, both in sequential and in parallel thanks to the
OpenMP task runtime. We report performance numbers that compare our approach to
reference sequential and multi-threaded implementations for the computation of
augmented merge and contour trees. These experiments demonstrate the run-time
efficiency of our approach and its scalability on common workstations. We
demonstrate the utility of our approach in data segmentation applications
Text Extraction from Web Images Based on A Split-and-Merge Segmentation Method Using Color Perception
This paper describes a complete approach to the segmentation and extraction of text from Web images for subsequent recognition, to ultimately achieve both effective indexing and presentation by non-visual means (e.g., audio). The method described here (the first in the authorsâ systematic approach to exploit human colour perception) enables the extraction of text in complex situations such as in the presence of varying colour (characters and background). More precisely, in addition to using structural features, the segmentation follows a split-and-merge strategy based on the Hue-Lightness- Saturation (HLS) representation of colour as a first approximation of an anthropocentric expression of the differences in chromaticity and lightness. Character-like components are then extracted as forming textlines in a number of orientations and along curves
Dynamic Planar Embeddings of Dynamic Graphs
We present an algorithm to support the dynamic embedding in the plane of a
dynamic graph. An edge can be inserted across a face between two vertices on
the face boundary (we call such a vertex pair linkable), and edges can be
deleted. The planar embedding can also be changed locally by flipping
components that are connected to the rest of the graph by at most two vertices.
Given vertices , linkable decides whether and are
linkable in the current embedding, and if so, returns a list of suggestions for
the placement of in the embedding. For non-linkable vertices , we
define a new query, one-flip-linkable providing a suggestion for a flip
that will make them linkable if one exists. We support all updates and queries
in O(log) time. Our time bounds match those of Italiano et al. for a
static (flipless) embedding of a dynamic graph.
Our new algorithm is simpler, exploiting that the complement of a spanning
tree of a connected plane graph is a spanning tree of the dual graph. The
primal and dual trees are interpreted as having the same Euler tour, and a main
idea of the new algorithm is an elegant interaction between top trees over the
two trees via their common Euler tour.Comment: Announced at STACS'1
- âŠ