Search CORE

83 research outputs found

MDDQL: an ontology driven, multi-lingual query language and system for an integrated view of heterogeneous data sources

Author: Chountas P.
Chountas P.
Kapetanios E.
Kapetanios E.
Publication venue
Publication date: 01/01/2005
Field of study

Query languages and keywords based search engines are conventionally specified and implemented with the emphasis put on syntactic rules to which query typing and answering must be bound. MDDQL is a query language and system that operates on a semantic model in terms of a graph based ontology. As a software technology, MDDQL allows the meaning of/and associations between information to be known and processed at execution time at following levels: (a) driving the user to the construction of, as meaningful as possible, queries with an advanced concept-based search method, (b) resolving high level queries into various data source specific query statements. In addition, queries can be posed in more than one natural sub-language. The major goal behind this approach has been the simplification and scalability of both tasks: query construction, even within multi-lingual user communities, and addressing of a large number of possibly semantically heterogeneous data sources in a distributed environment

WestminsterResearch

A performance of comparative study for semi-structured web data extraction model

Author: Man Mustafa
Sabri Ily Amalina Ahmad
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2019
Field of study

The extraction of information from multi-sources of web is an essential yet complicated step for data analysis in multiple domains. In this paper, we present a data extraction model based on visual segmentation, DOM tree and JSON approach which is known as Wrapper Extraction of Image using DOM and JSON (WEIDJ) for extracting semi-structured data from biodiversity web. The large number of information from multiple sources of web which is image’s information will be extracted using three different approach; Document Object Model (DOM), Wrapper image using Hybrid DOM and JSON (WHDJ) and Wrapper Extraction of Image using DOM and JSON (WEIDJ). Experiments were conducted on several biodiversity website. The experiment results show that WEIDJ approach promising results with respect to time analysis values. WEIDJ wrapper has successfully extracted greater than 100 images of data from the multi-source web biodiversity of over 15 different websites

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Building Intelligent Web Applications Using Lightweight Wrappers

Author: Azavant Fabien
Sahuguet Arnaud
Publication venue: ScholarlyCommons
Publication date: 01/01/2000
Field of study

The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human. Unfortunately, the Web is not yet a well organized repository of nicely structured documents but rather a conglomerate of volatile HTML pages. To address this problem, we present the World Wide Web Wrapper Factory (W4F), a toolkit for the generation of wrappers for Web sources, that offers: (1) an expressive language to specify the extraction of complex structures from HTML pages; (2) a declarative mapping to various data formats like XML; (3) some visual tools to make the engineering of wrappers faster and easier

CiteSeerX

ScholarlyCommons@Penn

AMMO-Prot: amine system project 3D-model finder

Author: Aldana-Montes José F
Montañez Raúl
Moya-García Aurelio A
Navas-Delgado Ismael
Pino-Ángeles Almudena
Sánchez-Jiménez Francisca
Urdiales José Luis
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

BACKGROUND: Amines are biogenic amino acid derivatives, which play pleiotropic and very important yet complex roles in animal physiology. For many other relevant biomolecules, biochemical and molecular data are being accumulated, which need to be integrated in order to be effective in the advance of biological knowledge in the field. For this purpose, a multidisciplinary group has started an ontology-based system named the Amine System Project (ASP) for which amine-related information is the validation bench. RESULTS: In this paper, we describe the Ontology-Based Mediator developed in the Amine System Project (http://asp.uma.es) using the infrastructure of Semantic Directories, and how this system has been used to solve a case related to amine metabolism-related protein structures. CONCLUSIONS: This infrastructure is used to publish and manage not only ontologies and their relationships, but also metadata relating to the resources committed with the ontologies. The system developed is available at http://asp.uma.es/WebMediator

Crossref

Springer - Publisher Connector

PubMed Central

UCL Discovery

User-friendly and Extensible Web Data Extraction

Author: Holubová Irena
Novella Tomáš
Publication venue: AIS Electronic Library (AISeL)
Publication date: 27/09/2017
Field of study

Creation of web wrappers is a subject of study in the field of web data extraction. Designing a domain-specific language for a web wrapper is a challenging task, because it introduces trade-offs between expressiveness of a wrapper’s language and safety. In addition, little attention has been paid to execution of a wrapper in a restricted environment.In this paper we present a new wrapping language -- Serrano -- that has three goals: (1) ability to run in a restricted environment, such as a browser extension, (2) extensibility to balance the tradeoffs between expressiveness of a command set and safety, and (3) processing capabilities to eliminate the need for additional programs to clean the extracted data. Serrano has been successfully deployed in a number of projects and provided encouraging results

AIS Electronic Library (AISeL)

WEB SCALE INFORMATION EXTRACTION USING WRAPPER INDUCTION APPROACH

Author: GADGE JAYANT
ZAMBAD RINA
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 08/09/2020
Field of study

Information extraction from unstructured, ungrammatical data such as classified listings is difficult because traditional structural and grammatical extraction methods do not apply. The proposed architecture extracts unstructured and un-grammatical data using wrapper induction and show the result in structured format. The source of data will be collected from various post website. The obtained post data pages are processed by page parsing, cleansing and data extraction to obtain new reference sets. Reference sets are used for mapping the user search query, which improvised the scale of search on unstructured and ungrammatical post data. We validate our approach with experimental results

Interscience Research Network

Design and analysis of quality information for data warehouses.

Author: Jarke M.
Jeusfeld M.A.
Quix C.
Publication venue
Publication date
Field of study

Research Papers in Economics

Integrating External Sources in a Corporate Semantic Web Managed by a Multi-agent System

Author: Gandon Fabien
Tuan-Dung Cao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/03/2003
Field of study

International audienceWe first describe a multi-agent system managing a corporate memory in the form of a corporate semantic web. We then focus on a newly introduced society of agents in charge of wrapping external HTML documents that are relevant to the activities of the organization, by extracting semantic Web annotations using tailored XSLT templates

HAL-UNICE

INRIA a CCSD electronic archive server

HAL-Rennes 1

Lion: Listen online. Using GraphQL as a mediator for data integration and ingestion

Author: Tubbs Dustyn James
Publication venue
Publication date: 01/08/2018
Field of study

Data integration is the task of providing a unified view of multiple data sources. Thesesources can be, and are typically, heterogeneous in their data model, data query language (DQL), and data manipulation language (DML). In this thesis is described a system called”Listen Online”, or Lion for short. Lion utilizes the GraphQL specification to provide integration for querying of web services. Lion provides a general structure by which arbitrary mediators can be used within a query. Lastly, by building on top of open source libraries,Lion provides the open source community with components that enable it to function in the form of GraphQL servers, visual layout libraries, and query builders

Illinois Digital Environment for Access to Learning and Scholarship Repository