2,557 research outputs found
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Recommended from our members
Integration, management and communication of heterogeneous design resources with WWW technologies
Recently, advanced information technologies have opened new pos-sibilities for collaborative designs. In this paper, a Web-based collaborative de-sign environment is proposed, where heterogeneous design applications can be integrated with a common interface, managed dynamically for publishing and searching, and communicated with each other for integrated multi-objective de-sign. The CORBA (Common Object Request Broker Architecture) is employed as an implementation tool to enable integration and communication of design application programs; and the XML (eXtensible Markup Language) is used as a common data descriptive language for data exchange between heterogeneous applications and for resource description and recording. This paper also intro-duces the implementation of the system and the encapsulating issues of existing legacy applications. At last, an example of gear design based on the system is il-lustrated to identify the methods and procedure developed by this research
JetWeb: A WWW Interface and Database for Monte Carlo Tuning and Validation
A World Wide Web interface to a Monte Carlo validation and tuning facility is
described. The aim of the package is to allow rapid and reproducible
comparisons to be made between detailed measurements at high-energy physics
colliders and general physics simulation packages. The package includes a
relational database, a Java servlet query and display facility, and clean
interfaces to simulation packages and their parameters.Comment: See http://jetweb.hep.ucl.ac.uk for further informatio
Supporting text mining for e-Science: the challenges for Grid-enabled natural language processing
Over the last few years, language technology has moved rapidly from 'applied research' to 'engineering', and from small-scale to large-scale engineering. Applications such as advanced text mining systems are feasible, but very resource-intensive, while research seeking to address the underlying language processing questions faces very real practical and methodological limitations. The e-Science vision, and the creation of the e-Science Grid, promises the level of integrated large-scale technological support required to sustain this important and successful new technology area. In this paper, we discuss the foundations for the deployment of text mining and other language technology on the Grid - the protocols and tools required to build distributed large-scale language technology systems, meeting the needs of users, application builders and researchers
Generating and visualizing a soccer knowledge base
This demo abstract describes the SmartWeb Ontology-based Information Extraction System (SOBIE). A key feature of SOBIE is that all information is extracted and stored with respect to the SmartWeb ontology. In this way, other components of the systems, which use the same ontology, can access this information in a straightforward way. We will show how information extracted by SOBIE is visualized within its original context, thus enhancing the browsing experience of the end user
- …