7,911 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    GRIDKIT: Pluggable overlay networks for Grid computing

    Get PDF
    A `second generation' approach to the provision of Grid middleware is now emerging which is built on service-oriented architecture and web services standards and technologies. However, advanced Grid applications have significant demands that are not addressed by present-day web services platforms. As one prime example, current platforms do not support the rich diversity of communication `interaction types' that are demanded by advanced applications (e.g. publish-subscribe, media streaming, peer-to-peer interaction). In the paper we describe the Gridkit middleware which augments the basic service-oriented architecture to address this particular deficiency. We particularly focus on the communications infrastructure support required to support multiple interaction types in a unified, principled and extensible manner-which we present in terms of the novel concept of pluggable overlay networks

    Transition of legacy systems to semantically enabled applications:TAO method and tools

    Get PDF
    Despite expectations being high, the industrial take-up of Semantic Web technologies in developing services and applications has been slower than expected. One of the main reasons is that many legacy systems have been developed without considering the potential of theWeb in integrating services and sharing resources.Without a systematic methodology and proper tool support, the migration from legacy systems to SemanticWeb Service-based systems can be a tedious and expensive process, which carries a significant risk of failure. There is an urgent need to provide strategies, allowing the migration of legacy systems to Semantic Web Services platforms, and also tools to support such strategies. In this paper we propose a methodology and its tool support for transitioning these applications to Semantic Web Services, which allow users to migrate their applications to Semantic Web Services platforms automatically or semi-automatically. The transition of the GATE system is used as a case study

    AMP: A Science-driven Web-based Application for the TeraGrid

    Full text link
    The Asteroseismic Modeling Portal (AMP) provides a web-based interface for astronomers to run and view simulations that derive the properties of Sun-like stars from observations of their pulsation frequencies. In this paper, we describe the architecture and implementation of AMP, highlighting the lightweight design principles and tools used to produce a functional fully-custom web-based science application in less than a year. Targeted as a TeraGrid science gateway, AMP's architecture and implementation are intended to simplify its orchestration of TeraGrid computational resources. AMP's web-based interface was developed as a traditional standalone database-backed web application using the Python-based Django web development framework, allowing us to leverage the Django framework's capabilities while cleanly separating the user interface development from the grid interface development. We have found this combination of tools flexible and effective for rapid gateway development and deployment.Comment: 7 pages, 2 figures, in Proceedings of the 5th Grid Computing Environments Worksho

    Design of Automatically Adaptable Web Wrappers

    Get PDF
    Nowadays, the huge amount of information distributed through the Web motivates studying techniques to\ud be adopted in order to extract relevant data in an efïŹcient and reliable way. Both academia and enterprises\ud developed several approaches of Web data extraction, for example using techniques of artiïŹcial intelligence or\ud machine learning. Some commonly adopted procedures, namely wrappers, ensure a high degree of precision\ud of information extracted from Web pages, and, at the same time, have to prove robustness in order not to\ud compromise quality and reliability of data themselves.\ud In this paper we focus on some experimental aspects related to the robustness of the data extraction process\ud and the possibility of automatically adapting wrappers. We discuss the implementation of algorithms for\ud ïŹnding similarities between two different version of a Web page, in order to handle modiïŹcations, avoiding\ud the failure of data extraction tasks and ensuring reliability of information extracted. Our purpose is to evaluate\ud performances, advantages and draw-backs of our novel system of automatic wrapper adaptation

    Body language, security and e-commerce

    Get PDF
    Security is becoming an increasingly more important concern both at the desktop level and at the network level. This article discusses several approaches to authenticating individuals through the use of biometric devices. While libraries might not implement such devices, they may appear in the near future of desktop computing, particularly for access to institutional computers or for access to sensitive information. Other approaches to computer security focus on protecting the contents of electronic transmissions and verification of individual users. After a brief overview of encryption technologies, the article examines public-key cryptography which is getting a lot of attention in the business world in what is called public key infrastructure. It also examines other efforts, such as IBM’s Cryptolope, the Secure Sockets Layer of Web browsers, and Digital Certificates and Signatures. Secure electronic transmissions are an important condition for conducting business on the Net. These business transactions are not limited to purchase orders, invoices, and contracts. This could become an important tool for information vendors and publishers to control access to the electronic resources they license. As license negotiators and contract administrators, librarians need to be aware of what is happening in these new technologies and the impact that will have on their operations
    • 

    corecore