Search CORE

19 research outputs found

A Generalized Framework for Ontology-Based Information Retrieval Application to a public-transportation system

Author: Abed Mourad
Zidi Amir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/09/2014
Field of study

In this paper we present a generic framework for ontology-based information retrieval. We focus on the recognition of semantic information extracted from data sources and the mapping of this knowledge into ontology. In order to achieve more scalability, we propose an approach for semantic indexing based on entity retrieval model. In addition, we have used ontology of public transportation domain in order to validate these proposals. Finally, we evaluated our system using ontology mapping and real world data sources. Experiments show that our framework can provide meaningful search results

arXiv.org e-Print Archive

Crossref

OILSW: A New System for Ontology Instance Learning

Author: Soltani Sima
Publication venue
Publication date: 01/01/2007
Field of study

The Semantic Web is expected to extend the current Web by providing structured content via the addition of annotations. Because of the large amount of pages in the Web, manual annotation is very time consuming. Finding an automatic or semiautomatic method to change the current Web to the Semantic Web is very helpful. In a specific domain, Web pages are the instances of that domain ontology. So we need semiautomatic tools to find these instances and fill their attributes. In this article, we propose a new system named OILSW for instance learning of an ontology from Web pages of Websites in a common domain. This system is the first comprehensive system for automatically populating the ontology for websites. By using this system, any Website in a certain domain can be automatically annotated

Librarians' Digital Library

Discovering Implicit Schemas in JSON Data

Author: Cabot Jordi
Cánovas Javier
Publication venue: HAL CCSD
Publication date: 08/07/2013
Field of study

International audienceJSON has become a very popular lightweigth format for data exchange. JSON is human readable and easy for computers to parse and use. However, JSON is schemaless. Though this brings some benefits (e.g., flexibility in the representation of the data) it can become a problem when consuming and integrating data from different JSON services since developers need to be aware of the structure of the schemaless data. We believe that a mechanism to discover (and visualize) the implicit schema of the JSON data would largely facilitate the creation and usage of JSON services. For instance, this would help developers to understand the links between a set of services belonging to the same domain or API. In this sense, we propose a model-based approach to generate the underlying schema of a set of JSON documents

INRIA a CCSD electronic archive server

HAL Mines Nantes

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

From the web of data to a world of action

Author: Catarci Tiziana
Daradimos Ilias
Dix Alan
Humayoun Shah Rukh
Ioannidis Yannis
Katifori Akrivi
Lepouras Giorgos
Md.akim Nazihah
Mora Miguel A.
Poggi Antonella
Terella Fabio
Vassilakis Costas
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

This is the author’s version of a work that was accepted for publication in Web Semantics: Science, Services and Agents on the World Wide Web. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Web Semantics: Science, Services and Agents on the World Wide Web 8.4 (2010): 10.1016/j.websem.2010.04.007This paper takes as its premise that the web is a place of action, not just information, and that the purpose of global data is to serve human needs. The paper presents several component technologies, which together work towards a vision where many small micro-applications can be threaded together using automated assistance to enable a unified and rich interaction. These technologies include data detector technology to enable any text to become a start point of semantic interaction; annotations for web-based services so that they can link data to potential actions; spreading activation over personal ontologies, to allow modelling of context; algorithms for automatically inferring 'typing' of web-form input data based on previous user inputs; and early work on inferring task structures from action traces. Some of these have already been integrated within an experimental web-based (extended) bookmarking tool, Snip!t, and a prototype desktop application On Time, and the paper discusses how the components could be more fully, yet more openly, linked in terms of both architecture and interaction. As well as contributing to the goal of an action and activity-focused web, the work also exposes a number of broader issues, theoretical, practical, social and economic, for the Semantic Web.Parts of this work were supported by the Information Society Technologies (IST) Program of the European Commission as part of the DELOS Network of Excellence on Digital Libraries (Contract G038- 507618). Thanks also to Emanuele Tracanna, Marco Piva, and Raffaele Giuliano for their work on On Time

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivio della ricerca- Università di Roma La Sapienza

Biblos-e Archivo

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY