83 research outputs found

    MDDQL: an ontology driven, multi-lingual query language and system for an integrated view of heterogeneous data sources

    Get PDF
    Query languages and keywords based search engines are conventionally specified and implemented with the emphasis put on syntactic rules to which query typing and answering must be bound. MDDQL is a query language and system that operates on a semantic model in terms of a graph based ontology. As a software technology, MDDQL allows the meaning of/and associations between information to be known and processed at execution time at following levels: (a) driving the user to the construction of, as meaningful as possible, queries with an advanced concept-based search method, (b) resolving high level queries into various data source specific query statements. In addition, queries can be posed in more than one natural sub-language. The major goal behind this approach has been the simplification and scalability of both tasks: query construction, even within multi-lingual user communities, and addressing of a large number of possibly semantically heterogeneous data sources in a distributed environment

    A performance of comparative study for semi-structured web data extraction model

    Get PDF
    The extraction of information from multi-sources of web is an essential yet complicated step for data analysis in multiple domains. In this paper, we present a data extraction model based on visual segmentation, DOM tree and JSON approach which is known as Wrapper Extraction of Image using DOM and JSON (WEIDJ) for extracting semi-structured data from biodiversity web. The large number of information from multiple sources of web which is image’s information will be extracted using three different approach; Document Object Model (DOM), Wrapper image using Hybrid DOM and JSON (WHDJ) and Wrapper Extraction of Image using DOM and JSON (WEIDJ). Experiments were conducted on several biodiversity website. The experiment results show that WEIDJ approach promising results with respect to time analysis values. WEIDJ wrapper has successfully extracted greater than 100 images of data from the multi-source web biodiversity of over 15 different websites

    Building Intelligent Web Applications Using Lightweight Wrappers

    Get PDF
    The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human. Unfortunately, the Web is not yet a well organized repository of nicely structured documents but rather a conglomerate of volatile HTML pages. To address this problem, we present the World Wide Web Wrapper Factory (W4F), a toolkit for the generation of wrappers for Web sources, that offers: (1) an expressive language to specify the extraction of complex structures from HTML pages; (2) a declarative mapping to various data formats like XML; (3) some visual tools to make the engineering of wrappers faster and easier

    AMMO-Prot: amine system project 3D-model finder

    Get PDF
    BACKGROUND: Amines are biogenic amino acid derivatives, which play pleiotropic and very important yet complex roles in animal physiology. For many other relevant biomolecules, biochemical and molecular data are being accumulated, which need to be integrated in order to be effective in the advance of biological knowledge in the field. For this purpose, a multidisciplinary group has started an ontology-based system named the Amine System Project (ASP) for which amine-related information is the validation bench. RESULTS: In this paper, we describe the Ontology-Based Mediator developed in the Amine System Project (http://asp.uma.es) using the infrastructure of Semantic Directories, and how this system has been used to solve a case related to amine metabolism-related protein structures. CONCLUSIONS: This infrastructure is used to publish and manage not only ontologies and their relationships, but also metadata relating to the resources committed with the ontologies. The system developed is available at http://asp.uma.es/WebMediator

    User-friendly and Extensible Web Data Extraction

    Get PDF
    Creation of web wrappers is a subject of study in the field of web data extraction. Designing a domain-specific language for a web wrapper is a challenging task, because it introduces trade-offs between expressiveness of a wrapper’s language and safety. In addition, little attention has been paid to execution of a wrapper in a restricted environment.In this paper we present a new wrapping language -- Serrano -- that has three goals: (1) ability to run in a restricted environment, such as a browser extension, (2) extensibility to balance the tradeoffs between expressiveness of a command set and safety, and (3) processing capabilities to eliminate the need for additional programs to clean the extracted data. Serrano has been successfully deployed in a number of projects and provided encouraging results

    WEB SCALE INFORMATION EXTRACTION USING WRAPPER INDUCTION APPROACH

    Get PDF
    Information extraction from unstructured, ungrammatical data such as classified listings is difficult because traditional structural and grammatical extraction methods do not apply. The proposed architecture extracts unstructured and un-grammatical data using wrapper induction and show the result in structured format. The source of data will be collected from various post website. The obtained post data pages are processed by page parsing, cleansing and data extraction to obtain new reference sets. Reference sets are used for mapping the user search query, which improvised the scale of search on unstructured and ungrammatical post data. We validate our approach with experimental results

    Integrating External Sources in a Corporate Semantic Web Managed by a Multi-agent System

    Get PDF
    International audienceWe first describe a multi-agent system managing a corporate memory in the form of a corporate semantic web. We then focus on a newly introduced society of agents in charge of wrapping external HTML documents that are relevant to the activities of the organization, by extracting semantic Web annotations using tailored XSLT templates

    Lion: Listen online. Using GraphQL as a mediator for data integration and ingestion

    Get PDF
    Data integration is the task of providing a unified view of multiple data sources. Thesesources can be, and are typically, heterogeneous in their data model, data query language (DQL), and data manipulation language (DML). In this thesis is described a system called”Listen Online”, or Lion for short. Lion utilizes the GraphQL specification to provide integration for querying of web services. Lion provides a general structure by which arbitrary mediators can be used within a query. Lastly, by building on top of open source libraries,Lion provides the open source community with components that enable it to function in the form of GraphQL servers, visual layout libraries, and query builders
    • 

    corecore