Search CORE

63,638 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Recommended from our members

A conceptual model of enterprise application integration in higher education institutions

Author: Alshawi SN
Aserey N
Publication venue: Brunel University
Publication date: 01/01/2013
Field of study

Copyright @ 2013 EMCIS.It is eminent that several applications’ systems are deployed at different levels in Higher Education (HE), ranging from academic and administrative to staff and students record systems. Many of these systems suffer from different problems due to the lack of integration such as data redundancy, inconsistency and maintenance cost. Enterprise Application Integration (EAI) can provide substantial benefits to these systems, such as assisting with business process integration, facilitating e-service based transformation and supporting collaborative decision-making. However, some factors that influence EAI adoption process in HE will be defined. This paper introduces a conceptual model to explain the outcome of using EAI in Higher Education Institutions (HEIs). Analyzing the combination of the existing classification of EAI factors with the HE factors will enhance the implementation of EAI in HEI at both organizational and operational levels. A pilot study at King Abdulaziz University (KAU), Kingdom of Saudi Arabia will be presented in this paper to show that the integration of the multiple information systems gives an integrated view to facilitate information access and reuse. Moreover data from different information systems is combined to gain a more comprehensive basis to satisfy the educational needs

Brunel University Research Archive

Improving lifecycle query in integrated toolchains using linked data and MQTT-based data warehousing

Author: Berezovskyi Andrii
El-khoury Jad
Kacimi Omar
Loiret Frédéric
Publication venue
Publication date: 01/01/2017
Field of study

The development of increasingly complex IoT systems requires large engineering environments. These environments generally consist of tools from different vendors and are not necessarily integrated well with each other. In order to automate various analyses, queries across resources from multiple tools have to be executed in parallel to the engineering activities. In this paper, we identify the necessary requirements on such a query capability and evaluate different architectures according to these requirements. We propose an improved lifecycle query architecture, which builds upon the existing Tracked Resource Set (TRS) protocol, and complements it with the MQTT messaging protocol in order to allow the data in the warehouse to be kept updated in real-time. As part of the case study focusing on the development of an IoT automated warehouse, this architecture was implemented for a toolchain integrated using RESTful microservices and linked data.Comment: 12 pages, worksho

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Challenges for the comprehensive management of cloud services in a PaaS framework

Author: Andrikopoulos Vasilios
Biro József
García-Gómez Sergio
Jiménez Gañán Miguel
Junker Frederic
Menychtas Andreas
Momm Christof
Strauch Steve
Taher Yehia
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2012
Field of study

The 4CaaSt project aims at developing a PaaS framework that enables flexible definition, marketing, deployment and management of Cloud-based services and applications. The major innovations proposed by 4CaaSt are the blueprint and its lifecycle management, a one stop shop for Cloud services and a PaaS level resource management featuring elasticity. 4CaaSt also provides a portfolio of ready to use Cloud native services and Cloud-aware immigrant technologies

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Tilburg University Repository

Enriched biodiversity data as a resource and service

Background: Recent years have seen a surge in projects that produce large volumes of structured, machine-readable biodiversity data. To make these data amenable to processing by generic, open source “data enrichment” workflows, they are increasingly being represented in a variety of standards-compliant interchange formats. Here, we report on an initiative in which software developers and taxonomists came together to address the challenges and highlight the opportunities in the enrichment of such biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon. Results: The hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools and visualisation. Most use cases and exemplar data were provided by taxonomists. One goal of the meeting was to facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as taxonomists, systematists, ecologists, niche modelers, informaticians and ontologists. The suggested use cases resulted in nine breakout groups addressing three main themes: i) mobilising heritage biodiversity knowledge; ii) formalising and linking concepts; and iii) addressing interoperability between service platforms. Another goal was to further foster a community of experts in biodiversity informatics and to build human links between research projects and institutions, in response to recent calls to further such integration in this research domain. Conclusions: Beyond deriving prototype solutions for each use case, areas of inadequacy were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilising biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalisation of the concepts (and the links between them) that define the research domain, as well as increased interoperability between the software platforms that operate on these concepts

Shared Research Repository

ZENODO

Directory of Open Access Journals

Open Research Online (The Open University)

PubMed Central

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Enlighten

The University of Manchester - Institutional Repository

ARPHA OAI-PMH Endpoint

ARPHA Preprints

A collaborative engineering platform for supporting design optimisation of advanced aero engine sub-systems

Author: Bosco P.
Corallo A.
De Poli G.P.
Peraudo Paolo Nestore
Zizzari A.
Publication venue: Simulia
Publication date: 01/01/2011
Field of study

PORTO Publications Open Repository TOrino

Developing front-end Web 2.0 technologies to access services, content and things in the future Internet

Author: Aghaee
Alonso
Anderson
Bianchini
Daniel
David Lizcano
Dey
Hierro
Juan Alfonso Lara
Juan Pazos
Keidl
Lizcano
Lizcano
Lizcano
Lizcano
Lizcano
María Aurora Martínez
McAfee
McAfee
Soriano
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The future Internet is expected to be composed of a mesh of interoperable web services accessible from all over the web. This approach has not yet caught on since global user?service interaction is still an open issue. This paper states one vision with regard to next-generation front-end Web 2.0 technology that will enable integrated access to services, contents and things in the future Internet. In this paper, we illustrate how front-ends that wrap traditional services and resources can be tailored to the needs of end users, converting end users into prosumers (creators and consumers of service-based applications). To do this, we propose an architecture that end users without programming skills can use to create front-ends, consult catalogues of resources tailored to their needs, easily integrate and coordinate front-ends and create composite applications to orchestrate services in their back-end. The paper includes a case study illustrating that current user-centred web development tools are at a very early stage of evolution. We provide statistical data on how the proposed architecture improves these tools. This paper is based on research conducted by the Service Front End (SFE) Open Alliance initiative

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Linking social, open, and enterprise data

Author: Davies John
Duke Alistair
Glaser Hugh
Omitola Tope
Shadbolt Nigel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Southampton (e-Prints Soton)