62,554 research outputs found

    Building Hyper View web sites

    Get PDF
    In this report a framework for building “virtual” web sites using the HyperView system is presented. Virtual web sites are web sites that offer information extracted and integrated from other web sites on the fly. The HyperView system already supports the demand-driven integration of information from different semistructured information sources into a graph database. The problem we are dealing with here is to query the database and generate HTML pages from the results as a response to HTTP requests received from the user. The returned HTML pages should hide the aspects of data extraction and integration and should give the user the impression of a single, coherent web site. We show first how HyperViews comprised of graph-transformation rules can be defined that generate HTML pages from the database. This way web sites for individual application schemata can be designed. In the second part we present a generic rule set that defines a web interface for HyperView graph databases with arbitrary schemata. This generic web interface can be customized for the particular application by annotating the database schema and chosing appropriate styles. The work presented in this report completes the HyperView approach in the sense that it closes the circle of extracting and integrating information from the web by again publishing the integrated data on the web. Our approach applies as well to the integration and generation of XML documents on the web

    Neogeography: The Challenge of Channelling Large and Ill-Behaved Data Streams

    Get PDF
    Neogeography is the combination of user generated data and experiences with mapping technologies. In this article we present a research project to extract valuable structured information with a geographic component from unstructured user generated text in wikis, forums, or SMSes. The extracted information should be integrated together to form a collective knowledge about certain domain. This structured information can be used further to help users from the same domain who want to get information using simple question answering system. The project intends to help workers communities in developing countries to share their knowledge, providing a simple and cheap way to contribute and get benefit using the available communication technology

    Comparative Analysis of Web of Science and Scopus on the Energy Efficiency and Climate Impact of Buildings

    Get PDF
    Although the body of scientific publications on energy efficiency and climate mitigation from buildings has been growing quickly in recent years, very few previous bibliometric analysis studies exist that analyze the literature in terms of specific content (trends or options for zero‐energy buildings) or coverage of different scientific databases. We evaluate the scientific literature published since January 2013 concerning alternative methods for improving the energy efficiency and mitigating climate impacts from buildings. We quantify and describe the literature through a bibliometric approach, comparing the databases Web of Science (WoS) and Scopus. A total of 19,416 (Scopus) and 17,468 (WoS) publications are analyzed, with only 11% common documents. The literature has grown steadily during this time period, with a peak in the year 2017. Most of the publications are in English, in the area of Engineering and Energy Fuels, and from institutions from China and the USA. Strong links are observed between the most published authors and institutions worldwide. An analysis of keywords reveals that most of research focuses on technologies for heating, ventilation, and air‐conditioning, phase change materials, as well as information and communication technologies. A significantly smaller segment of the literature takes a broader perspective (greenhouse gas emissions, life cycle, and sustainable development), investigating implementation issues (policies and costs) or renewable energy (solar). Knowledge gaps are detected in the areas of behavioral changes, the circular economy, and some renewable energy sources (geothermal, biomass, small wind). We conclude that i) the contents of WoS and Scopus are radically different in the studied fields; ii) research seems to focus on technological aspects; and iii) there are weak links between research on energy and on climate mitigation and sustainability, the latter themes being misrepresented in the literature. These conclusions should be validated with further analyses of the documents identified in this study. We recommend that future research focuses on filling the above identified gaps, assessing the contents of several scientific databases, and extending energy analyses to their effects in terms of mitigation potentials.This work was funded by the Ministerio de Ciencia, InnovaciĂłn y Universidades de España (RTI2018‐ 093849‐B‐C31), by ICREA under the ICREA Academia programme, and by the foundation SIVL

    Hybrid Information Retrieval Model For Web Images

    Full text link
    The Bing Bang of the Internet in the early 90's increased dramatically the number of images being distributed and shared over the web. As a result, image information retrieval systems were developed to index and retrieve image files spread over the Internet. Most of these systems are keyword-based which search for images based on their textual metadata; and thus, they are imprecise as it is vague to describe an image with a human language. Besides, there exist the content-based image retrieval systems which search for images based on their visual information. However, content-based type systems are still immature and not that effective as they suffer from low retrieval recall/precision rate. This paper proposes a new hybrid image information retrieval model for indexing and retrieving web images published in HTML documents. The distinguishing mark of the proposed model is that it is based on both graphical content and textual metadata. The graphical content is denoted by color features and color histogram of the image; while textual metadata are denoted by the terms that surround the image in the HTML document, more particularly, the terms that appear in the tags p, h1, and h2, in addition to the terms that appear in the image's alt attribute, filename, and class-label. Moreover, this paper presents a new term weighting scheme called VTF-IDF short for Variable Term Frequency-Inverse Document Frequency which unlike traditional schemes, it exploits the HTML tag structure and assigns an extra bonus weight for terms that appear within certain particular HTML tags that are correlated to the semantics of the image. Experiments conducted to evaluate the proposed IR model showed a high retrieval precision rate that outpaced other current models.Comment: LACSC - Lebanese Association for Computational Sciences, http://www.lacsc.org/; International Journal of Computer Science & Emerging Technologies (IJCSET), Vol. 3, No. 1, February 201

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Impliance: A Next Generation Information Management Appliance

    Full text link
    ably successful in building a large market and adapting to the changes of the last three decades, its impact on the broader market of information management is surprisingly limited. If we were to design an information management system from scratch, based upon today's requirements and hardware capabilities, would it look anything like today's database systems?" In this paper, we introduce Impliance, a next-generation information management system consisting of hardware and software components integrated to form an easy-to-administer appliance that can store, retrieve, and analyze all types of structured, semi-structured, and unstructured information. We first summarize the trends that will shape information management for the foreseeable future. Those trends imply three major requirements for Impliance: (1) to be able to store, manage, and uniformly query all data, not just structured records; (2) to be able to scale out as the volume of this data grows; and (3) to be simple and robust in operation. We then describe four key ideas that are uniquely combined in Impliance to address these requirements, namely the ideas of: (a) integrating software and off-the-shelf hardware into a generic information appliance; (b) automatically discovering, organizing, and managing all data - unstructured as well as structured - in a uniform way; (c) achieving scale-out by exploiting simple, massive parallel processing, and (d) virtualizing compute and storage resources to unify, simplify, and streamline the management of Impliance. Impliance is an ambitious, long-term effort to define simpler, more robust, and more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US
    • 

    corecore