5,863 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Advanced Knowledge Technologies at the Midterm: Tools and Methods for the Semantic Web

    Get PDF
    The University of Edinburgh and research sponsors are authorised to reproduce and distribute reprints and on-line copies for their purposes notwithstanding any copyright annotation hereon. The views and conclusions contained herein are the author’s and shouldn’t be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of other parties.In a celebrated essay on the new electronic media, Marshall McLuhan wrote in 1962:Our private senses are not closed systems but are endlessly translated into each other in that experience which we call consciousness. Our extended senses, tools, technologies, through the ages, have been closed systems incapable of interplay or collective awareness. Now, in the electric age, the very instantaneous nature of co-existence among our technological instruments has created a crisis quite new in human history. Our extended faculties and senses now constitute a single field of experience which demands that they become collectively conscious. Our technologies, like our private senses, now demand an interplay and ratio that makes rational co-existence possible. As long as our technologies were as slow as the wheel or the alphabet or money, the fact that they were separate, closed systems was socially and psychically supportable. This is not true now when sight and sound and movement are simultaneous and global in extent. (McLuhan 1962, p.5, emphasis in original)Over forty years later, the seamless interplay that McLuhan demanded between our technologies is still barely visible. McLuhan’s predictions of the spread, and increased importance, of electronic media have of course been borne out, and the worlds of business, science and knowledge storage and transfer have been revolutionised. Yet the integration of electronic systems as open systems remains in its infancy.Advanced Knowledge Technologies (AKT) aims to address this problem, to create a view of knowledge and its management across its lifecycle, to research and create the services and technologies that such unification will require. Half way through its sixyear span, the results are beginning to come through, and this paper will explore some of the services, technologies and methodologies that have been developed. We hope to give a sense in this paper of the potential for the next three years, to discuss the insights and lessons learnt in the first phase of the project, to articulate the challenges and issues that remain.The WWW provided the original context that made the AKT approach to knowledge management (KM) possible. AKT was initially proposed in 1999, it brought together an interdisciplinary consortium with the technological breadth and complementarity to create the conditions for a unified approach to knowledge across its lifecycle. The combination of this expertise, and the time and space afforded the consortium by the IRC structure, suggested the opportunity for a concerted effort to develop an approach to advanced knowledge technologies, based on the WWW as a basic infrastructure.The technological context of AKT altered for the better in the short period between the development of the proposal and the beginning of the project itself with the development of the semantic web (SW), which foresaw much more intelligent manipulation and querying of knowledge. The opportunities that the SW provided for e.g., more intelligent retrieval, put AKT in the centre of information technology innovation and knowledge management services; the AKT skill set would clearly be central for the exploitation of those opportunities.The SW, as an extension of the WWW, provides an interesting set of constraints to the knowledge management services AKT tries to provide. As a medium for the semantically-informed coordination of information, it has suggested a number of ways in which the objectives of AKT can be achieved, most obviously through the provision of knowledge management services delivered over the web as opposed to the creation and provision of technologies to manage knowledge.AKT is working on the assumption that many web services will be developed and provided for users. The KM problem in the near future will be one of deciding which services are needed and of coordinating them. Many of these services will be largely or entirely legacies of the WWW, and so the capabilities of the services will vary. As well as providing useful KM services in their own right, AKT will be aiming to exploit this opportunity, by reasoning over services, brokering between them, and providing essential meta-services for SW knowledge service management.Ontologies will be a crucial tool for the SW. The AKT consortium brings a lot of expertise on ontologies together, and ontologies were always going to be a key part of the strategy. All kinds of knowledge sharing and transfer activities will be mediated by ontologies, and ontology management will be an important enabling task. Different applications will need to cope with inconsistent ontologies, or with the problems that will follow the automatic creation of ontologies (e.g. merging of pre-existing ontologies to create a third). Ontology mapping, and the elimination of conflicts of reference, will be important tasks. All of these issues are discussed along with our proposed technologies.Similarly, specifications of tasks will be used for the deployment of knowledge services over the SW, but in general it cannot be expected that in the medium term there will be standards for task (or service) specifications. The brokering metaservices that are envisaged will have to deal with this heterogeneity.The emerging picture of the SW is one of great opportunity but it will not be a wellordered, certain or consistent environment. It will comprise many repositories of legacy data, outdated and inconsistent stores, and requirements for common understandings across divergent formalisms. There is clearly a role for standards to play to bring much of this context together; AKT is playing a significant role in these efforts. But standards take time to emerge, they take political power to enforce, and they have been known to stifle innovation (in the short term). AKT is keen to understand the balance between principled inference and statistical processing of web content. Logical inference on the Web is tough. Complex queries using traditional AI inference methods bring most distributed computer systems to their knees. Do we set up semantically well-behaved areas of the Web? Is any part of the Web in which semantic hygiene prevails interesting enough to reason in? These and many other questions need to be addressed if we are to provide effective knowledge technologies for our content on the web

    A knowledge hub to enhance the learning processes of an industrial cluster

    Get PDF
    Industrial clusters have been defined as ?networks of production of strongly interdependent firms (including specialised suppliers), knowledge producing agents (universities, research institutes, engineering companies), institutions (brokers, consultants), linked to each other in a value adding production chain? (OECD Focus Group, 1999). The industrial clusters distinctive mode of production is specialisation, based on a sophisticated division of labour, that leads to interlinked activities and need for cooperation, with the consequent emergence of communities of practice (CoPs). CoPs are here conceived as groups of people and/or organisations bound together by shared expertise and propensity towards a joint work (Wenger and Suyden, 1999). Cooperation needs closeness for just-in-time delivery, for communication, for the exchange of knowledge, especially in its tacit form. Indeed the knowledge exchanges between the CoPs specialised actors, in geographical proximity, lead to spillovers and synergies. In the digital economy landscape, the use of collaborative technologies, such as shared repositories, chat rooms and videoconferences can, when appropriately used, have a positive impact on the development of the CoP exchanges process of codified knowledge. On the other end, systems for the individuals profile management, e-learning platforms and intelligent agents can trigger also some socialisation mechanisms of tacit knowledge. In this perspective, we have set-up a model of a Knowledge Hub (KH), driven by the Information and Communication Technologies (ICT-driven), that enables the knowledge exchanges of a CoP. In order to present the model, the paper is organised in the following logical steps: - an overview of the most seminal and consolidated approaches to CoPs; - a description of the KH model, ICT-driven, conceived as a booster of the knowledge exchanges of a CoP, that adds to the economic benefits coming from geographical proximity, the advantages coming from organizational proximity, based on the ICTs; - a discussion of some preliminary results that we are obtaining during the implementation of the model.

    Towards a killer app for the Semantic Web

    Get PDF
    Killer apps are highly transformative technologies that create new markets and widespread patterns of behaviour. IT generally, and the Web in particular, has benefited from killer apps to create new networks of users and increase its value. The Semantic Web community on the other hand is still awaiting a killer app that proves the superiority of its technologies. There are certain features that distinguish killer apps from other ordinary applications. This paper examines those features in the context of the Semantic Web, in the hope that a better understanding of the characteristics of killer apps might encourage their consideration when developing Semantic Web applications

    Features for Killer Apps from a Semantic Web Perspective

    Get PDF
    There are certain features that that distinguish killer apps from other ordinary applications. This chapter examines those features in the context of the semantic web, in the hope that a better understanding of the characteristics of killer apps might encourage their consideration when developing semantic web applications. Killer apps are highly tranformative technologies that create new e-commerce venues and widespread patterns of behaviour. Information technology, generally, and the Web, in particular, have benefited from killer apps to create new networks of users and increase its value. The semantic web community on the other hand is still awaiting a killer app that proves the superiority of its technologies. The authors hope that this chapter will help to highlight some of the common ingredients of killer apps in e-commerce, and discuss how such applications might emerge in the semantic web

    DRIVER Technology Watch Report

    Get PDF
    This report is part of the Discovery Workpackage (WP4) and is the third report out of four deliverables. The objective of this report is to give an overview of the latest technical developments in the world of digital repositories, digital libraries and beyond, in order to serve as theoretical and practical input for the technical DRIVER developments, especially those focused on enhanced publications. This report consists of two main parts, one part focuses on interoperability standards for enhanced publications, the other part consists of three subchapters, which give a landscape picture of current and surfacing technologies and communities crucial to DRIVER. These three subchapters contain the GRID, CRIS and LTP communities and technologies. Every chapter contains a theoretical explanation, followed by case studies and the outcomes and opportunities for DRIVER in this field

    An Information Extraction Approach to Reorganizing and Summarizing Specifications

    Get PDF
    Materials and Process Specifications are complex semi-structured documents containing numeric data, text, and images. This article describes a coarse-grain extraction technique to automatically reorganize and summarize spec content. Specifically, a strategy for semantic-markup, to capture content within a semantic ontology, relevant to semi-automatic extraction, has been developed and experimented with. The working prototypes were built in the context of Cohesia\u27s existing software infrastructure, and use techniques from Information Extraction, XML technology, etc

    Service-oriented architecture for device lifecycle support in industrial automation

    Get PDF
    Dissertação para obtenção do Grau de Doutor em Engenharia ElectrotĂ©cnica e de Computadores Especialidade: RobĂłtica e Manufactura IntegradaThis thesis addresses the device lifecycle support thematic in the scope of service oriented industrial automation domain. This domain is known for its plethora of heterogeneous equipment encompassing distinct functions, form factors, network interfaces, or I/O specifications supported by dissimilar software and hardware platforms. There is then an evident and crescent need to take every device into account and improve the agility performance during setup, control, management, monitoring and diagnosis phases. Service-oriented Architecture (SOA) paradigm is currently a widely endorsed approach for both business and enterprise systems integration. SOA concepts and technology are continuously spreading along the layers of the enterprise organization envisioning a unified interoperability solution. SOA promotes discoverability, loose coupling, abstraction, autonomy and composition of services relying on open web standards – features that can provide an important contribution to the industrial automation domain. The present work seized industrial automation device level requirements, constraints and needs to determine how and where can SOA be employed to solve some of the existent difficulties. Supported by these outcomes, a reference architecture shaped by distributed, adaptive and composable modules is proposed. This architecture will assist and ease the role of systems integrators during reengineering-related interventions throughout system lifecycle. In a converging direction, the present work also proposes a serviceoriented device model to support previous architecture vision and goals by including embedded added-value in terms of service-oriented peer-to-peer discovery and identification, configuration, management, as well as agile customization of device resources. In this context, the implementation and validation work proved not simply the feasibility and fitness of the proposed solution to two distinct test-benches but also its relevance to the expanding domain of SOA applications to support device lifecycle in the industrial automation domain

    A Reference Architecture for Mobile Knowledge Management

    Get PDF
    Although mobile knowledge management (mKM) is being perceived as an emerging R&D field, its concepts and approaches are not well-settled, as opposed to the general field of Knowledge Management (KM). In this work, we try to establish a definition for mKM. Taking into account building blocks of KM in enterprises and the abstract use cases of mKM systems we introduce an reference architecture for mKM systems as a basis for verifying and comparing concepts and system architectures. Finally we address the potential of mKM to be suitable as a prototype model for mobile, situation-aware information processing in the field of Ambient Intelligence Environments
    • 

    corecore