19 research outputs found
A Generalized Framework for Ontology-Based Information Retrieval Application to a public-transportation system
In this paper we present a generic framework for ontology-based information
retrieval. We focus on the recognition of semantic information extracted from
data sources and the mapping of this knowledge into ontology. In order to
achieve more scalability, we propose an approach for semantic indexing based on
entity retrieval model. In addition, we have used ontology of public
transportation domain in order to validate these proposals. Finally, we
evaluated our system using ontology mapping and real world data sources.
Experiments show that our framework can provide meaningful search results
OILSW: A New System for Ontology Instance Learning
The Semantic Web is expected to extend the current Web by
providing structured content via the addition of annotations. Because of
the large amount of pages in the Web, manual annotation is very time
consuming. Finding an automatic or semiautomatic method to change
the current Web to the Semantic Web is very helpful. In a specific
domain, Web pages are the instances of that domain ontology. So we
need semiautomatic tools to find these instances and fill their attributes.
In this article, we propose a new system named OILSW for instance
learning of an ontology from Web pages of Websites in a common
domain. This system is the first comprehensive system for automatically
populating the ontology for websites. By using this system, any Website
in a certain domain can be automatically annotated
Discovering Implicit Schemas in JSON Data
International audienceJSON has become a very popular lightweigth format for data exchange. JSON is human readable and easy for computers to parse and use. However, JSON is schemaless. Though this brings some benefits (e.g., flexibility in the representation of the data) it can become a problem when consuming and integrating data from different JSON services since developers need to be aware of the structure of the schemaless data. We believe that a mechanism to discover (and visualize) the implicit schema of the JSON data would largely facilitate the creation and usage of JSON services. For instance, this would help developers to understand the links between a set of services belonging to the same domain or API. In this sense, we propose a model-based approach to generate the underlying schema of a set of JSON documents
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
From the web of data to a world of action
This is the authorâs version of a work that was accepted for publication in Web Semantics: Science, Services and Agents on the World Wide Web. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Web Semantics: Science, Services and Agents on the World Wide Web 8.4
(2010): 10.1016/j.websem.2010.04.007This paper takes as its premise that the web is a place of action, not just information, and that the purpose of
global data is to serve human needs. The paper presents several component technologies, which together work
towards a vision where many small micro-applications can be threaded together using automated assistance to
enable a unified and rich interaction. These technologies include data detector technology to enable any text to
become a start point of semantic interaction; annotations for web-based services so that they can link data to
potential actions; spreading activation over personal ontologies, to allow modelling of context; algorithms for
automatically inferring 'typing' of web-form input data based on previous user inputs; and early work on inferring
task structures from action traces. Some of these have already been integrated within an experimental web-based
(extended) bookmarking tool, Snip!t, and a prototype desktop application On Time, and the paper discusses how the
components could be more fully, yet more openly, linked in terms of both architecture and interaction. As well as
contributing to the goal of an action and activity-focused web, the work also exposes a number of broader issues,
theoretical, practical, social and economic, for the Semantic Web.Parts of this work were supported by the Information
Society Technologies (IST) Program of the European
Commission as part of the DELOS Network of
Excellence on Digital Libraries (Contract G038-
507618). Thanks also to Emanuele Tracanna, Marco
Piva, and Raffaele Giuliano for their work on On
Time
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform