181,553 research outputs found

    The DIGMAP geo-temporal web gazetteer service

    Get PDF
    This paper presents the DIGMAP geo-temporal Web gazetteer service, a system providing access to names of places, historical periods, and associated geo-temporal information. Within the DIGMAP project, this gazetteer serves as the unified repository of geographic and temporal information, assisting in the recognition and disambiguation of geo-temporal expressions over text, as well as in resource searching and indexing. We describe the data integration methodology, the handling of temporal information and some of the applications that use the gazetteer. Initial evaluation results show that the proposed system can adequately support several tasks related to geo-temporal information extraction and retrieval

    Implementation and Web Mounting of the WebOMiner_S Recommendation System

    Get PDF
    The ability to quickly extract information from a large amount of heterogeneous data available on the web from various Business to Consumer (B2C) or Ecommerce stores selling similar products (such as Laptops) for comparative querying and knowledge discovery remains a challenge because different web sites have different structures for their web data and web data are unstructured. For example: Find out the best and cheapest deal for Dell Laptop comparing BestBuy.ca and Amazon.com based on the following specification: Model: Inspiron 15 series, ram: 16gb, processor: i5, Hdd: 1 TB. The “WebOMiner” and “WebOMiner_S” systems perform automatic extraction by first parsing web html source code into a document object model (DOM) tree before using some pattern mining techniques to discover heterogeneous data types (e.g. text, image, links, lists) so that product schemas are extracted and stored in a back-end data warehouse for querying and recommendation. Although a web interface application of this system needs to be developed to make it accessible for to all users on the web.This thesis proposes a Web Recommendation System through Graphical User Interface, which is mounted readily on the web and is accessible to all users. It also performs integration of the web data consisting of all the product features such as Product model name, product description, market price subject to the retailer, etc. retained from the extraction process. Implementation is done using “Java server pages (JSP)” as the GUI designed in HTML, CSS, JavaScript and the framework used for this application is “Spring framework” which forms a bridge between the GUI and the data warehouse. SQL database is implemented to store the extracted product schemas for further integration, querying and knowledge discovery. All the technologies used are compatible with UNIX system for hosting the required application

    Documenting Data Integration Using Knowledge Graphs

    Get PDF
    With the increasing volume of data on the Web and the proliferation of published knowledge graphs, there is a growing need for improved data management and information extraction. However, heterogeneity issues across the data sources, i.e., various formats and systems, negatively impact efficient access, manage, reuse, and analyze the data. A data integration system (DIS) provides uniform access to heterogeneous data sources and their relationships; it offers a unified and comprehensive view of the data. DISs resort to mapping rules, expressed in declarative languages like RML, to align data from various sources to classes and properties defined in an ontology. This work defines a knowledge graph where data integration systems are represented as factual statements. The aim of this work is to provide the basis for integrated analysis of data collected from heterogeneous data silos. The proposed knowledge graph is also specified as a data integration system, that integrates all data integration systems. The proposed solution includes a unified schema, which defines and explains the relationships between all elements in the data integration system DIS=⟨G, S, M, F⟩. The results suggest that factual statements from the proposed knowledge graph, improve the understanding of the features that characterize knowledge graphs declaratively defined like data integration systems

    Integração automatizada de informação de horários de transportes

    Get PDF
    The ever-growing Web contains a large amount of data. This large amount of data is useful when combined with applications that can refine it and use it to improve its users’ lives. However, using the data available is not an easy task since most of the information is not represented in machine-friendly formats. Instead, this information is represented in formats ideal for human users, resulting in an additional effort for having machines interpreting, extracting, and integrating it, while at the same time ensuring the consistency of information from different sources. In this project, a solution using an ontology-based integration combined with web robots’ extraction automates the process required for updating information regarding schedules of public transports. An already existing application receives that information and uses it to calculate efficient routes for commuters. The proposed solution can extract information from multiple online sources and transform it into different formats. It can extract and transform the information from PDFs and HTML. The system provides a web service for the exportation of these formats by a route optimization system. This document contains the detailed process of the design and construction of the integration system. It describes the alternatives and selections that lead to the application created. Lastly, it evaluates the solution by performing extraction from several sources relevant to the project’s domain

    Building Hyper View web sites

    Get PDF
    In this report a framework for building “virtual” web sites using the HyperView system is presented. Virtual web sites are web sites that offer information extracted and integrated from other web sites on the fly. The HyperView system already supports the demand-driven integration of information from different semistructured information sources into a graph database. The problem we are dealing with here is to query the database and generate HTML pages from the results as a response to HTTP requests received from the user. The returned HTML pages should hide the aspects of data extraction and integration and should give the user the impression of a single, coherent web site. We show first how HyperViews comprised of graph-transformation rules can be defined that generate HTML pages from the database. This way web sites for individual application schemata can be designed. In the second part we present a generic rule set that defines a web interface for HyperView graph databases with arbitrary schemata. This generic web interface can be customized for the particular application by annotating the database schema and chosing appropriate styles. The work presented in this report completes the HyperView approach in the sense that it closes the circle of extracting and integrating information from the web by again publishing the integrated data on the web. Our approach applies as well to the integration and generation of XML documents on the web

    Code generator for integrating warehouse XML data sources.

    Get PDF
    XML---the extensible Markup Language, has been recognized as the standard for data representation and exchange on the world wide web. Vast amounts of XML data are available on the web. Since the information on the web is stored on separate web pages, it is very hard to combine pieces of information for decision support purposes. Data warehouse data integration provides a solution for integrating the different XML source data into a unique format with meaningful information for decision support systems. A data warehouse is a large integrated database organized around major subjects of an enterprise for the purpose of decision support querying. Many enterprises are creating their own data warehouse systems from scratch in different varying formats, making the issue of building a more efficient, more reliable, cost-effective and easy-to-use data warehouse system important. Building a code generator for creating a program that automatically integrates XML data sources into a target data warehouse is one solution. There is little research showing the use of the newest XML techniques in code generator for data warehouse XML data integration. This thesis proposes a Warehouse Integrator code generator for XML (WIG4X), which integrates XML data sources into a target data warehouse by first generating Java programs for data extracting, cleaning and loading XML data into the data warehouse. WIG4X system also generates the programs for creating XML views from the data warehouse. XML schema mapping strategy is employed for structural integration of each XML data source to data warehouse using a first order logic-like-language similar to that used in INFOMASTER. The content integration is handled through XML data extraction, conversion constraints, data cleaning and data loading. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2001 .L57. Source: Masters Abstracts International, Volume: 40-06, page: 1549. Adviser: Christie Ezeife. Thesis (M.Sc.)--University of Windsor (Canada), 2002

    Integration of Wikipedia and a Geography Digital Library

    Get PDF
    In this paper, we address the problem of integrating Wikipedia, an online encyclopedia, and G-Portal, a web-based digital library, in the geography domain. The integration facilitates the sharing of data and services between the two web applications that are of great value in learning. We first present an overall system architecture for supporting such an integration and address the metadata extraction problem associated with it. In metadata extraction, we focus on extracting and constructing metadata for geo-political regions namely cities and countries. Some empirical performance results will be presented. The paper will also describe the adaptations of G-Portal and Wikipedia to meet the integration requirements.Accepted versio
    • …
    corecore