7,835 research outputs found
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Mapping Large Scale Research Metadata to Linked Data: A Performance Comparison of HBase, CSV and XML
OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises a
database of all EC FP7 and H2020 funded research projects, including metadata
of their results (publications and datasets). These data are stored in an HBase
NoSQL database, post-processed, and exposed as HTML for human consumption, and
as XML through a web service interface. As an intermediate format to facilitate
statistical computations, CSV is generated internally. To interlink the
OpenAIRE data with related data on the Web, we aim at exporting them as Linked
Open Data (LOD). The LOD export is required to integrate into the overall data
processing workflow, where derived data are regenerated from the base data
every day. We thus faced the challenge of identifying the best-performing
conversion approach.We evaluated the performances of creating LOD by a
MapReduce job on top of HBase, by mapping the intermediate CSV files, and by
mapping the XML output.Comment: Accepted in 0th Metadata and Semantics Research Conferenc
Handling Confidential Data on the Untrusted Cloud: An Agent-based Approach
Cloud computing allows shared computer and storage facilities to be used by a
multitude of clients. While cloud management is centralized, the information
resides in the cloud and information sharing can be implemented via
off-the-shelf techniques for multiuser databases. Users, however, are very
diffident for not having full control over their sensitive data. Untrusted
database-as-a-server techniques are neither readily extendable to the cloud
environment nor easily understandable by non-technical users. To solve this
problem, we present an approach where agents share reserved data in a secure
manner by the use of simple grant-and-revoke permissions on shared data.Comment: 7 pages, 9 figures, Cloud Computing 201
CLOUD-BASED SOLUTIONS IMPROVING TRANSPARENCY, OPENNESS AND EFFICIENCY OF OPEN GOVERNMENT DATA
A central pillar of open government programs is the disclosure of data held by public agencies using Information and Communication Technologies (ICT). This disclosure relies on the creation of open data portals (e.g. Data.gov) and has subsequently been associated with the expression Open Government Data (OGD). The overall goal of these governmental initiatives is not limited to enhance transparency of public sectors but aims to raise awareness of how released data can be put to use in order to enable the creation of new products and services by private sectors.
Despite the usage of technological platforms to facilitate access to government data, open data portals continue to be organized in order to serve the goals of public agencies without opening the doors to public accountability, information transparency, public scrutiny, etc. This thesis considers the basic aspects of OGD including the definition of technical models for organizing such complex contexts, the identification of techniques for combining data from several portals and the proposal of user interfaces that focus on citizen-centred usability.
In order to deal with the above issues, this thesis presents a holistic approach to OGD that aims to go beyond problems inherent their simple disclosure by providing a tentative answer to the following questions:
1) To what extent do the OGD-based applications contribute towards the creation of innovative, value-added services?
2) What technical solutions could increase the strength of this contribution?
3) Can Web 2.0 and Cloud technologies favour the development of OGD apps?
4) How should be designed a common framework for developing OGD apps that rely on multiple OGD portals and external web resources?
In particular, this thesis is focused on devising computational environments that leverage the content of OGD portals (supporting the initial phase of data disclosure) for the creation of new services that add value to the original data.
The thesis is organized as follows. In order to offer a general view about OGD, some important aspects about open data initiatives are presented including their state of art, the existing approaches for publishing and consuming OGD across web resources, and the factors shaping the value generated through government data portals.
Then, an architectural framework is proposed that gathers OGD from multiple sites and supports the development of cloud-based apps that leverage these data according to potentially different exploitation roots ranging from traditional business to specialized supports for citizens.
The proposed framework is validated by two cloud-based apps, namely ODMap (Open Data Mapping) and NESSIE (A Network-based Environment Supporting Spatial Information Exploration). In particular, ODMap supports citizens in searching and accessing OGD from several web sites. NESSIE organizes data captured from real estate agencies and public agencies (i.e. municipalities, cadastral offices and chambers of commerce) in order to provide citizens with a geographic representation of real estate offers and relevant statistics about the price trend.A central pillar of open government programs is the disclosure of data held by public agencies using Information and Communication Technologies (ICT). This disclosure relies on the creation of open data portals (e.g. Data.gov) and has subsequently been associated with the expression Open Government Data (OGD). The overall goal of these governmental initiatives is not limited to enhance transparency of public sectors but aims to raise awareness of how released data can be put to use in order to enable the creation of new products and services by private sectors.
Despite the usage of technological platforms to facilitate access to government data, open data portals continue to be organized in order to serve the goals of public agencies without opening the doors to public accountability, information transparency, public scrutiny, etc. This thesis considers the basic aspects of OGD including the definition of technical models for organizing such complex contexts, the identification of techniques for combining data from several portals and the proposal of user interfaces that focus on citizen-centred usability.
In order to deal with the above issues, this thesis presents a holistic approach to OGD that aims to go beyond problems inherent their simple disclosure by providing a tentative answer to the following questions:
1) To what extent do the OGD-based applications contribute towards the creation of innovative, value-added services?
2) What technical solutions could increase the strength of this contribution?
3) Can Web 2.0 and Cloud technologies favour the development of OGD apps?
4) How should be designed a common framework for developing OGD apps that rely on multiple OGD portals and external web resources?
In particular, this thesis is focused on devising computational environments that leverage the content of OGD portals (supporting the initial phase of data disclosure) for the creation of new services that add value to the original data.
The thesis is organized as follows. In order to offer a general view about OGD, some important aspects about open data initiatives are presented including their state of art, the existing approaches for publishing and consuming OGD across web resources, and the factors shaping the value generated through government data portals.
Then, an architectural framework is proposed that gathers OGD from multiple sites and supports the development of cloud-based apps that leverage these data according to potentially different exploitation roots ranging from traditional business to specialized supports for citizens.
The proposed framework is validated by two cloud-based apps, namely ODMap (Open Data Mapping) and NESSIE (A Network-based Environment Supporting Spatial Information Exploration). In particular, ODMap supports citizens in searching and accessing OGD from several web sites. NESSIE organizes data captured from real estate agencies and public agencies (i.e. municipalities, cadastral offices and chambers of commerce) in order to provide citizens with a geographic representation of real estate offers and relevant statistics about the price trend
- …