18,126 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Effective Cloud Detection and Segmentation using a Gradient-Based Algorithm for Satellite Imagery; Application to improve PERSIANN-CCS

    Full text link
    Being able to effectively identify clouds and monitor their evolution is one important step toward more accurate quantitative precipitation estimation and forecast. In this study, a new gradient-based cloud-image segmentation technique is developed using tools from image processing techniques. This method integrates morphological image gradient magnitudes to separable cloud systems and patches boundaries. A varying scale-kernel is implemented to reduce the sensitivity of image segmentation to noise and capture objects with various finenesses of the edges in remote-sensing images. The proposed method is flexible and extendable from single- to multi-spectral imagery. Case studies were carried out to validate the algorithm by applying the proposed segmentation algorithm to synthetic radiances for channels of the Geostationary Operational Environmental Satellites (GOES-R) simulated by a high-resolution weather prediction model. The proposed method compares favorably with the existing cloud-patch-based segmentation technique implemented in the PERSIANN-CCS (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Network - Cloud Classification System) rainfall retrieval algorithm. Evaluation of event-based images indicates that the proposed algorithm has potential to improve rain detection and estimation skills with an average of more than 45% gain comparing to the segmentation technique used in PERSIANN-CCS and identifying cloud regions as objects with accuracy rates up to 98%
    corecore