Search CORE

7,911 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

GRIDKIT: Pluggable overlay networks for Grid computing

Author: A. El-Sayed
A. Grimshaw
A. Rowstron
B. Li
F. Dabek
F. Kon
G. Coulson
H. Balakrishnan
K. Czajkowski
L. Mathy
M. Castro
M. Clark
N. Furmento
N. Parlavantzas
P. Grace
S. Floyd
S. Pallickara
Publication venue: SPRINGER-VERLAG BERLIN
Publication date: 01/01/2004
Field of study

A `second generation' approach to the provision of Grid middleware is now emerging which is built on service-oriented architecture and web services standards and technologies. However, advanced Grid applications have significant demands that are not addressed by present-day web services platforms. As one prime example, current platforms do not support the rich diversity of communication `interaction types' that are demanded by advanced applications (e.g. publish-subscribe, media streaming, peer-to-peer interaction). In the paper we describe the Gridkit middleware which augments the basic service-oriented architecture to address this particular deficiency. We particularly focus on the communications infrastructure support required to support multiple interaction types in a unified, principled and extensible manner-which we present in terms of the novel concept of pluggable overlay networks

CiteSeerX

Crossref

Lancaster E-Prints

Transition of legacy systems to semantically enabled applications:TAO method and tools

Author: Bontcheva Kalina
Damljanovic Danica
Gibbins Nicholas
Payne Terry
Wang Hai
Publication venue: 'IOS Press'
Publication date: 01/01/2012
Field of study

Despite expectations being high, the industrial take-up of Semantic Web technologies in developing services and applications has been slower than expected. One of the main reasons is that many legacy systems have been developed without considering the potential of theWeb in integrating services and sharing resources.Without a systematic methodology and proper tool support, the migration from legacy systems to SemanticWeb Service-based systems can be a tedious and expensive process, which carries a significant risk of failure. There is an urgent need to provide strategies, allowing the migration of legacy systems to Semantic Web Services platforms, and also tools to support such strategies. In this paper we propose a methodology and its tool support for transitioning these applications to Semantic Web Services, which allow users to migrate their applications to Semantic Web Services platforms automatically or semi-automatically. The transition of the GATE system is used as a case study

Crossref

Aston Publications Explorer

AMP: A Science-driven Web-based Application for the TeraGrid

Author: Metcalfe Travis
Shorrock Ian
Woitaszek Matthew
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

The Asteroseismic Modeling Portal (AMP) provides a web-based interface for astronomers to run and view simulations that derive the properties of Sun-like stars from observations of their pulsation frequencies. In this paper, we describe the architecture and implementation of AMP, highlighting the lightweight design principles and tools used to produce a functional fully-custom web-based science application in less than a year. Targeted as a TeraGrid science gateway, AMP's architecture and implementation are intended to simplify its orchestration of TeraGrid computational resources. AMP's web-based interface was developed as a traditional standalone database-backed web application using the Python-based Django web development framework, allowing us to leverage the Django framework's capabilities while cleanly separating the user interface development from the grid interface development. We have found this combination of tools flexible and effective for rapid gateway development and deployment.Comment: 7 pages, 2 figures, in Proceedings of the 5th Grid Computing Environments Worksho

arXiv.org e-Print Archive

Crossref

Design of Automatically Adaptable Web Wrappers

Author: Baumgartner Robert
Ferrara Emilio
Publication venue
Publication date: 01/01/2011
Field of study

Nowadays, the huge amount of information distributed through the Web motivates studying techniques to\ud be adopted in order to extract relevant data in an efﬁcient and reliable way. Both academia and enterprises\ud developed several approaches of Web data extraction, for example using techniques of artiﬁcial intelligence or\ud machine learning. Some commonly adopted procedures, namely wrappers, ensure a high degree of precision\ud of information extracted from Web pages, and, at the same time, have to prove robustness in order not to\ud compromise quality and reliability of data themselves.\ud In this paper we focus on some experimental aspects related to the robustness of the data extraction process\ud and the possibility of automatically adapting wrappers. We discuss the implementation of algorithms for\ud ﬁnding similarities between two different version of a Web page, in order to handle modiﬁcations, avoiding\ud the failure of data extraction tasks and ensuring reliability of information extracted. Our purpose is to evaluate\ud performances, advantages and draw-backs of our novel system of automatic wrapper adaptation

arXiv.org e-Print Archive

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Body language, security and e-commerce

Author: Desmarais Norman
Publication venue: DigitalCommons@Providence
Publication date: 01/03/2000
Field of study

Security is becoming an increasingly more important concern both at the desktop level and at the network level. This article discusses several approaches to authenticating individuals through the use of biometric devices. While libraries might not implement such devices, they may appear in the near future of desktop computing, particularly for access to institutional computers or for access to sensitive information. Other approaches to computer security focus on protecting the contents of electronic transmissions and verification of individual users. After a brief overview of encryption technologies, the article examines public-key cryptography which is getting a lot of attention in the business world in what is called public key infrastructure. It also examines other efforts, such as IBM’s Cryptolope, the Secure Sockets Layer of Web browsers, and Digital Certificates and Signatures. Secure electronic transmissions are an important condition for conducting business on the Net. These business transactions are not limited to purchase orders, invoices, and contracts. This could become an important tool for information vendors and publishers to control access to the electronic resources they license. As license negotiators and contract administrators, librarians need to be aware of what is happening in these new technologies and the impact that will have on their operations

DigitalCommons@Providence

HELIN Digital Commons

Recommended from our members

Development of an online collaborative working environment for design and manufacturing

Author: Yu X
Publication venue
Publication date: 01/01/2008
Field of study

This research is to develop a novel collaborative working environment (CWE) for manufacturing and design using advanced Web/Internet technologies such as Web Service, Grid Service and other related software tools/packages. To achieve the above, the following research modules are developed by the author: A service oriented framework for computer aid design, which acts as an online collaboration system, has been developed with the utilisation of the latest technology, Web Service. The concept of Service-Oriented Architecture has been implemented in the framework. Users from anywhere in the world can join the design process from their PCs, no matter what operation system they are using. The service-oriented system has the capability of going through firewalls and can afford multi-users due to the characteristics of Web service. Also the loose-coupling structure makes the system very easy to be updated. Another module for the CWE is to solve the software sharing problem when the platform is used among several geographically dispersed users or organisations. A software package bank system has been developed, which utilised the ideology of service oriented approach and successfully solved traditional problems in this field. Based on the outcomes mentioned above, the research finally developed a more powerful infrastructure using Grid service, which is a further development of Grid computing and Web service. The Grid service is considered to be the most important future solvent for Internet

Nottingham Trent Institutional Repository (IRep)

OpenGrey Repository

Description and interaction of RESTful services for automatic discovery and execution

Author: De Roo Jos
Steiner Thomas
Vallés Joaquim Gabarro
Van de Walle Rik
Van Deursen Davy
Verborgh Ruben
Publication venue: Future Technology Research Association International (FTRA)
Publication date: 01/01/2011
Field of study

Ghent University Academic Bibliography