18,126 research outputs found
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Effective Cloud Detection and Segmentation using a Gradient-Based Algorithm for Satellite Imagery; Application to improve PERSIANN-CCS
Being able to effectively identify clouds and monitor their evolution is one
important step toward more accurate quantitative precipitation estimation and
forecast. In this study, a new gradient-based cloud-image segmentation
technique is developed using tools from image processing techniques. This
method integrates morphological image gradient magnitudes to separable cloud
systems and patches boundaries. A varying scale-kernel is implemented to reduce
the sensitivity of image segmentation to noise and capture objects with various
finenesses of the edges in remote-sensing images. The proposed method is
flexible and extendable from single- to multi-spectral imagery. Case studies
were carried out to validate the algorithm by applying the proposed
segmentation algorithm to synthetic radiances for channels of the Geostationary
Operational Environmental Satellites (GOES-R) simulated by a high-resolution
weather prediction model. The proposed method compares favorably with the
existing cloud-patch-based segmentation technique implemented in the
PERSIANN-CCS (Precipitation Estimation from Remotely Sensed Information using
Artificial Neural Network - Cloud Classification System) rainfall retrieval
algorithm. Evaluation of event-based images indicates that the proposed
algorithm has potential to improve rain detection and estimation skills with an
average of more than 45% gain comparing to the segmentation technique used in
PERSIANN-CCS and identifying cloud regions as objects with accuracy rates up to
98%
- …