Search CORE

5,863 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Advanced Knowledge Technologies at the Midterm: Tools and Methods for the Semantic Web

Author: Ciravegna Fabio
Domingue John
Hall Wendy
Motta Enrico
O'Hara Kieron
Robertson David
Shadbolt Nigel
Sleeman Derek
Tate Austin
Wilks Yorick
Publication venue: School of Electronics and Computer Science, University of Southampton
Publication date: 01/01/2004
Field of study

The University of Edinburgh and research sponsors are authorised to reproduce and distribute reprints and on-line copies for their purposes notwithstanding any copyright annotation hereon. The views and conclusions contained herein are the author’s and shouldn’t be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of other parties.In a celebrated essay on the new electronic media, Marshall McLuhan wrote in 1962:Our private senses are not closed systems but are endlessly translated into each other in that experience which we call consciousness. Our extended senses, tools, technologies, through the ages, have been closed systems incapable of interplay or collective awareness. Now, in the electric age, the very instantaneous nature of co-existence among our technological instruments has created a crisis quite new in human history. Our extended faculties and senses now constitute a single field of experience which demands that they become collectively conscious. Our technologies, like our private senses, now demand an interplay and ratio that makes rational co-existence possible. As long as our technologies were as slow as the wheel or the alphabet or money, the fact that they were separate, closed systems was socially and psychically supportable. This is not true now when sight and sound and movement are simultaneous and global in extent. (McLuhan 1962, p.5, emphasis in original)Over forty years later, the seamless interplay that McLuhan demanded between our technologies is still barely visible. McLuhan’s predictions of the spread, and increased importance, of electronic media have of course been borne out, and the worlds of business, science and knowledge storage and transfer have been revolutionised. Yet the integration of electronic systems as open systems remains in its infancy.Advanced Knowledge Technologies (AKT) aims to address this problem, to create a view of knowledge and its management across its lifecycle, to research and create the services and technologies that such unification will require. Half way through its sixyear span, the results are beginning to come through, and this paper will explore some of the services, technologies and methodologies that have been developed. We hope to give a sense in this paper of the potential for the next three years, to discuss the insights and lessons learnt in the first phase of the project, to articulate the challenges and issues that remain.The WWW provided the original context that made the AKT approach to knowledge management (KM) possible. AKT was initially proposed in 1999, it brought together an interdisciplinary consortium with the technological breadth and complementarity to create the conditions for a unified approach to knowledge across its lifecycle. The combination of this expertise, and the time and space afforded the consortium by the IRC structure, suggested the opportunity for a concerted effort to develop an approach to advanced knowledge technologies, based on the WWW as a basic infrastructure.The technological context of AKT altered for the better in the short period between the development of the proposal and the beginning of the project itself with the development of the semantic web (SW), which foresaw much more intelligent manipulation and querying of knowledge. The opportunities that the SW provided for e.g., more intelligent retrieval, put AKT in the centre of information technology innovation and knowledge management services; the AKT skill set would clearly be central for the exploitation of those opportunities.The SW, as an extension of the WWW, provides an interesting set of constraints to the knowledge management services AKT tries to provide. As a medium for the semantically-informed coordination of information, it has suggested a number of ways in which the objectives of AKT can be achieved, most obviously through the provision of knowledge management services delivered over the web as opposed to the creation and provision of technologies to manage knowledge.AKT is working on the assumption that many web services will be developed and provided for users. The KM problem in the near future will be one of deciding which services are needed and of coordinating them. Many of these services will be largely or entirely legacies of the WWW, and so the capabilities of the services will vary. As well as providing useful KM services in their own right, AKT will be aiming to exploit this opportunity, by reasoning over services, brokering between them, and providing essential meta-services for SW knowledge service management.Ontologies will be a crucial tool for the SW. The AKT consortium brings a lot of expertise on ontologies together, and ontologies were always going to be a key part of the strategy. All kinds of knowledge sharing and transfer activities will be mediated by ontologies, and ontology management will be an important enabling task. Different applications will need to cope with inconsistent ontologies, or with the problems that will follow the automatic creation of ontologies (e.g. merging of pre-existing ontologies to create a third). Ontology mapping, and the elimination of conflicts of reference, will be important tasks. All of these issues are discussed along with our proposed technologies.Similarly, specifications of tasks will be used for the deployment of knowledge services over the SW, but in general it cannot be expected that in the medium term there will be standards for task (or service) specifications. The brokering metaservices that are envisaged will have to deal with this heterogeneity.The emerging picture of the SW is one of great opportunity but it will not be a wellordered, certain or consistent environment. It will comprise many repositories of legacy data, outdated and inconsistent stores, and requirements for common understandings across divergent formalisms. There is clearly a role for standards to play to bring much of this context together; AKT is playing a significant role in these efforts. But standards take time to emerge, they take political power to enforce, and they have been known to stifle innovation (in the short term). AKT is keen to understand the balance between principled inference and statistical processing of web content. Logical inference on the Web is tough. Complex queries using traditional AI inference methods bring most distributed computer systems to their knees. Do we set up semantically well-behaved areas of the Web? Is any part of the Web in which semantic hygiene prevails interesting enough to reason in? These and many other questions need to be addressed if we are to provide effective knowledge technologies for our content on the web

Southampton (e-Prints Soton)

Edinburgh Research Archive

A knowledge hub to enhance the learning processes of an industrial cluster

Author: Angelo Corallo
Giuseppina Passiante
Publication venue
Publication date
Field of study

Industrial clusters have been defined as ?networks of production of strongly interdependent firms (including specialised suppliers), knowledge producing agents (universities, research institutes, engineering companies), institutions (brokers, consultants), linked to each other in a value adding production chain? (OECD Focus Group, 1999). The industrial clusters distinctive mode of production is specialisation, based on a sophisticated division of labour, that leads to interlinked activities and need for cooperation, with the consequent emergence of communities of practice (CoPs). CoPs are here conceived as groups of people and/or organisations bound together by shared expertise and propensity towards a joint work (Wenger and Suyden, 1999). Cooperation needs closeness for just-in-time delivery, for communication, for the exchange of knowledge, especially in its tacit form. Indeed the knowledge exchanges between the CoPs specialised actors, in geographical proximity, lead to spillovers and synergies. In the digital economy landscape, the use of collaborative technologies, such as shared repositories, chat rooms and videoconferences can, when appropriately used, have a positive impact on the development of the CoP exchanges process of codified knowledge. On the other end, systems for the individuals profile management, e-learning platforms and intelligent agents can trigger also some socialisation mechanisms of tacit knowledge. In this perspective, we have set-up a model of a Knowledge Hub (KH), driven by the Information and Communication Technologies (ICT-driven), that enables the knowledge exchanges of a CoP. In order to present the model, the paper is organised in the following logical steps: - an overview of the most seminal and consolidated approaches to CoPs; - a description of the KH model, ICT-driven, conceived as a booster of the knowledge exchanges of a CoP, that adds to the economic benefits coming from geographical proximity, the advantages coming from organizational proximity, based on the ICTs; - a discussion of some preliminary results that we are obtaining during the implementation of the model.

Research Papers in Economics

Towards a killer app for the Semantic Web

Author: C. Marshall
C.M. Christensen
D. Fensel
D. Gruhl
D.L. McGuinness
E. Wenger
G. Schreiber
N. Shadbolt
O.E. Williamson
O.E. Williamson
P. Evans
P. Mika
R. Coase
Y. Kalfoglou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Killer apps are highly transformative technologies that create new markets and widespread patterns of behaviour. IT generally, and the Web in particular, has benefited from killer apps to create new networks of users and increase its value. The Semantic Web community on the other hand is still awaiting a killer app that proves the superiority of its technologies. There are certain features that distinguish killer apps from other ordinary applications. This paper examines those features in the context of the Semantic Web, in the hope that a better understanding of the characteristics of killer apps might encourage their consideration when developing Semantic Web applications

Southampton (e-Prints Soton)

Crossref

Open Research Online (The Open University)

Features for Killer Apps from a Semantic Web Perspective

Author: Alani Harith
Kalfoglou Yannis
O'Hara Kieron
Shadbolt Nigel
Publication venue: Information Science Reference
Publication date: 01/01/2008
Field of study

There are certain features that that distinguish killer apps from other ordinary applications. This chapter examines those features in the context of the semantic web, in the hope that a better understanding of the characteristics of killer apps might encourage their consideration when developing semantic web applications. Killer apps are highly tranformative technologies that create new e-commerce venues and widespread patterns of behaviour. Information technology, generally, and the Web, in particular, have benefited from killer apps to create new networks of users and increase its value. The semantic web community on the other hand is still awaiting a killer app that proves the superiority of its technologies. The authors hope that this chapter will help to highlight some of the common ingredients of killer apps in e-commerce, and discuss how such applications might emerge in the semantic web

Southampton (e-Prints Soton)

Open Research Online (The Open University)

DRIVER Technology Watch Report

Author: Hochstenbach Patrick
Karstens Elbaek Mikael
Russell Rosemary
Schmelz Pedersen Gerd
Van Godtsenhoven Karen
Vanderfeesten Maurice
Publication venue: DRIVER project
Publication date: 01/01/2008
Field of study

This report is part of the Discovery Workpackage (WP4) and is the third report out of four deliverables. The objective of this report is to give an overview of the latest technical developments in the world of digital repositories, digital libraries and beyond, in order to serve as theoretical and practical input for the technical DRIVER developments, especially those focused on enhanced publications. This report consists of two main parts, one part focuses on interoperability standards for enhanced publications, the other part consists of three subchapters, which give a landscape picture of current and surfacing technologies and communities crucial to DRIVER. These three subchapters contain the GRID, CRIS and LTP communities and technologies. Every chapter contains a theoretical explanation, followed by case studies and the outcomes and opportunities for DRIVER in this field

Ghent University Academic Bibliography

An Information Extraction Approach to Reorganizing and Summarizing Specifications

Author: Berkovich Aaron
Sokol Dan Z.
Thirunarayan Krishnaprasad
Publication venue: CORE Scholar
Publication date: 01/03/2005
Field of study

Materials and Process Specifications are complex semi-structured documents containing numeric data, text, and images. This article describes a coarse-grain extraction technique to automatically reorganize and summarize spec content. Specifically, a strategy for semantic-markup, to capture content within a semantic ontology, relevant to semi-automatic extraction, has been developed and experimented with. The working prototypes were built in the context of Cohesia\u27s existing software infrastructure, and use techniques from Information Extraction, XML technology, etc

CORE

Service-oriented architecture for device lifecycle support in industrial automation

Author: Cândido Gonçalo Moreira
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2013
Field of study

Dissertação para obtenção do Grau de Doutor em Engenharia Electrotécnica e de Computadores Especialidade: Robótica e Manufactura IntegradaThis thesis addresses the device lifecycle support thematic in the scope of service oriented industrial automation domain. This domain is known for its plethora of heterogeneous equipment encompassing distinct functions, form factors, network interfaces, or I/O specifications supported by dissimilar software and hardware platforms. There is then an evident and crescent need to take every device into account and improve the agility performance during setup, control, management, monitoring and diagnosis phases. Service-oriented Architecture (SOA) paradigm is currently a widely endorsed approach for both business and enterprise systems integration. SOA concepts and technology are continuously spreading along the layers of the enterprise organization envisioning a unified interoperability solution. SOA promotes discoverability, loose coupling, abstraction, autonomy and composition of services relying on open web standards – features that can provide an important contribution to the industrial automation domain. The present work seized industrial automation device level requirements, constraints and needs to determine how and where can SOA be employed to solve some of the existent difficulties. Supported by these outcomes, a reference architecture shaped by distributed, adaptive and composable modules is proposed. This architecture will assist and ease the role of systems integrators during reengineering-related interventions throughout system lifecycle. In a converging direction, the present work also proposes a serviceoriented device model to support previous architecture vision and goals by including embedded added-value in terms of service-oriented peer-to-peer discovery and identification, configuration, management, as well as agile customization of device resources. In this context, the implementation and validation work proved not simply the feasibility and fitness of the proposed solution to two distinct test-benches but also its relevance to the expanding domain of SOA applications to support device lifecycle in the industrial automation domain

Repositório da Universidade Nova de Lisboa

A Reference Architecture for Mobile Knowledge Management

Author: Balfanz Dirk
Grimm Matthias
Tazari Mohammad-Reza
Publication venue: Dagstuhl Seminar Proceedings. 05181 - Mobile Computing and Ambient Intelligence: The Challenge of Multimedia
Publication date: 01/01/2005
Field of study

Although mobile knowledge management (mKM) is being perceived as an emerging R&D field, its concepts and approaches are not well-settled, as opposed to the general field of Knowledge Management (KM). In this work, we try to establish a definition for mKM. Taking into account building blocks of KM in enterprises and the abstract use cases of mKM systems we introduce an reference architecture for mKM systems as a basis for verifying and comparing concepts and system architectures. Finally we address the potential of mKM to be suitable as a prototype model for mobile, situation-aware information processing in the field of Ambient Intelligence Environments

Fraunhofer-ePrints

Dagstuhl Research Online Publication Server