7,911 research outputs found
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
GRIDKIT: Pluggable overlay networks for Grid computing
A `second generation' approach to the provision of Grid middleware is now emerging which is built on service-oriented architecture and web services standards and technologies. However, advanced Grid applications have significant demands that are not addressed by present-day web services platforms. As one prime example, current platforms do not support the rich diversity of communication `interaction types' that are demanded by advanced applications (e.g. publish-subscribe, media streaming, peer-to-peer interaction). In the paper we describe the Gridkit middleware which augments the basic service-oriented architecture to address this particular deficiency. We particularly focus on the communications infrastructure support required to support multiple interaction types in a unified, principled and extensible manner-which we present in terms of the novel concept of pluggable overlay networks
Transition of legacy systems to semantically enabled applications:TAO method and tools
Despite expectations being high, the industrial take-up of Semantic Web technologies in developing services and applications has been slower than expected. One of the main reasons is that many legacy systems have been developed without considering the potential of theWeb in integrating services and sharing resources.Without a systematic methodology and proper tool support, the migration from legacy systems to SemanticWeb Service-based systems can be a tedious and expensive process, which carries a significant risk of failure. There is an urgent need to provide strategies, allowing the migration of legacy systems to Semantic Web Services platforms, and also tools to support such strategies. In this paper we propose a methodology and its tool support for transitioning these applications to Semantic Web Services, which allow users to migrate their applications to Semantic Web Services platforms automatically or semi-automatically. The transition of the GATE system is used as a case study
AMP: A Science-driven Web-based Application for the TeraGrid
The Asteroseismic Modeling Portal (AMP) provides a web-based interface for
astronomers to run and view simulations that derive the properties of Sun-like
stars from observations of their pulsation frequencies. In this paper, we
describe the architecture and implementation of AMP, highlighting the
lightweight design principles and tools used to produce a functional
fully-custom web-based science application in less than a year. Targeted as a
TeraGrid science gateway, AMP's architecture and implementation are intended to
simplify its orchestration of TeraGrid computational resources. AMP's web-based
interface was developed as a traditional standalone database-backed web
application using the Python-based Django web development framework, allowing
us to leverage the Django framework's capabilities while cleanly separating the
user interface development from the grid interface development. We have found
this combination of tools flexible and effective for rapid gateway development
and deployment.Comment: 7 pages, 2 figures, in Proceedings of the 5th Grid Computing
Environments Worksho
Design of Automatically Adaptable Web Wrappers
Nowadays, the huge amount of information distributed through the Web motivates studying techniques to\ud
be adopted in order to extract relevant data in an efïŹcient and reliable way. Both academia and enterprises\ud
developed several approaches of Web data extraction, for example using techniques of artiïŹcial intelligence or\ud
machine learning. Some commonly adopted procedures, namely wrappers, ensure a high degree of precision\ud
of information extracted from Web pages, and, at the same time, have to prove robustness in order not to\ud
compromise quality and reliability of data themselves.\ud
In this paper we focus on some experimental aspects related to the robustness of the data extraction process\ud
and the possibility of automatically adapting wrappers. We discuss the implementation of algorithms for\ud
ïŹnding similarities between two different version of a Web page, in order to handle modiïŹcations, avoiding\ud
the failure of data extraction tasks and ensuring reliability of information extracted. Our purpose is to evaluate\ud
performances, advantages and draw-backs of our novel system of automatic wrapper adaptation
Body language, security and e-commerce
Security is becoming an increasingly more important concern both at the desktop level and at the network level. This article discusses several approaches to authenticating individuals through the use of biometric devices. While libraries might not implement such devices, they may appear in the near future of desktop computing, particularly for access to institutional computers or for access to sensitive information. Other approaches to computer security focus on protecting the contents of electronic transmissions and verification of individual users. After a brief overview of encryption technologies, the article examines public-key cryptography which is getting a lot of attention in the business world in what is called public key infrastructure. It also examines other efforts, such as IBMâs Cryptolope, the Secure Sockets Layer of Web browsers, and Digital Certificates and Signatures. Secure electronic transmissions are an important condition for conducting business on the Net. These business transactions are not limited to purchase orders, invoices, and contracts. This could become an important tool for information vendors and publishers to control access to the electronic resources they license. As license negotiators and contract administrators, librarians need to be aware of what is happening in these new technologies and the impact that will have on their operations
Recommended from our members
Development of an online collaborative working environment for design and manufacturing
This research is to develop a novel collaborative working environment (CWE) for manufacturing and design using advanced Web/Internet technologies such as Web Service, Grid Service and other related software tools/packages. To achieve the above, the following research modules are developed by the author: A service oriented framework for computer aid design, which acts as an online collaboration system, has been developed with the utilisation of the latest technology, Web Service. The concept of Service-Oriented Architecture has been implemented in the framework. Users from anywhere in the world can join the design process from their PCs, no matter what operation system they are using. The service-oriented system has the capability of going through firewalls and can afford multi-users due to the characteristics of Web service. Also the loose-coupling structure makes the system very easy to be updated. Another module for the CWE is to solve the software sharing problem when the platform is used among several geographically dispersed users or organisations. A software package bank system has been developed, which utilised the ideology of service oriented approach and successfully solved traditional problems in this field. Based on the outcomes mentioned above, the research finally developed a more powerful infrastructure using Grid service, which is a further development of Grid computing and Web service. The Grid service is considered to be the most important future solvent for Internet
- âŠ