62,923 research outputs found
Text Extraction from Web Images Based on A Split-and-Merge Segmentation Method Using Color Perception
This paper describes a complete approach to the segmentation and extraction of text from Web images for subsequent recognition, to ultimately achieve both effective indexing and presentation by non-visual means (e.g., audio). The method described here (the first in the authors’ systematic approach to exploit human colour perception) enables the extraction of text in complex situations such as in the presence of varying colour (characters and background). More precisely, in addition to using structural features, the segmentation follows a split-and-merge strategy based on the Hue-Lightness- Saturation (HLS) representation of colour as a first approximation of an anthropocentric expression of the differences in chromaticity and lightness. Character-like components are then extracted as forming textlines in a number of orientations and along curves
Abmash: Mashing Up Legacy Web Applications by Automated Imitation of Human Actions
Many business web-based applications do not offer applications programming
interfaces (APIs) to enable other applications to access their data and
functions in a programmatic manner. This makes their composition difficult (for
instance to synchronize data between two applications). To address this
challenge, this paper presents Abmash, an approach to facilitate the
integration of such legacy web applications by automatically imitating human
interactions with them. By automatically interacting with the graphical user
interface (GUI) of web applications, the system supports all forms of
integrations including bi-directional interactions and is able to interact with
AJAX-based applications. Furthermore, the integration programs are easy to
write since they deal with end-user, visual user-interface elements. The
integration code is simple enough to be called a "mashup".Comment: Software: Practice and Experience (2013)
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Two Approaches for Text Segmentation in Web Images
There is a significant need to recognise the text in images on web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents and compares two novel methods for the segmentation of characters for subsequent extraction and recognition. The novelty of both approaches is the combination of (different in each case) topological features of characters with an anthropocentric perspective of colour perception— in preference to RGB space analysis. Both approaches enable the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background)
Two Approaches for Text Segmentation in Web Images
There is a significant need to recognise the text in images on web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents and compares two novel methods for the segmentation of characters for subsequent extraction and recognition. The novelty of both approaches is the combination of (different in each case) topological features of characters with an anthropocentric perspective of colour perception— in preference to RGB space analysis. Both approaches enable the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background)
- …