62,554 research outputs found
Building Hyper View web sites
In this report a framework for building âvirtualâ web sites using the
HyperView system is presented. Virtual web sites are web sites that offer
information extracted and integrated from other web sites on the fly. The
HyperView system already supports the demand-driven integration of information
from different semistructured information sources into a graph database. The
problem we are dealing with here is to query the database and generate HTML
pages from the results as a response to HTTP requests received from the user.
The returned HTML pages should hide the aspects of data extraction and
integration and should give the user the impression of a single, coherent web
site. We show first how HyperViews comprised of graph-transformation rules can
be defined that generate HTML pages from the database. This way web sites for
individual application schemata can be designed. In the second part we present
a generic rule set that defines a web interface for HyperView graph databases
with arbitrary schemata. This generic web interface can be customized for the
particular application by annotating the database schema and chosing
appropriate styles. The work presented in this report completes the HyperView
approach in the sense that it closes the circle of extracting and integrating
information from the web by again publishing the integrated data on the web.
Our approach applies as well to the integration and generation of XML
documents on the web
Neogeography: The Challenge of Channelling Large and Ill-Behaved Data Streams
Neogeography is the combination of user generated data and experiences with mapping technologies. In this article we present a research project to extract valuable structured information with a geographic component from unstructured user generated text in wikis, forums, or SMSes. The extracted information should be integrated together to form a collective knowledge about certain domain. This structured information can be used further to help users from the same domain who want to get information using simple question answering system. The project intends to help workers communities in developing countries to share their knowledge, providing a simple and cheap way to contribute and get benefit using the available communication technology
Comparative Analysis of Web of Science and Scopus on the Energy Efficiency and Climate Impact of Buildings
Although the body of scientific publications on energy efficiency and climate mitigation from buildings has been growing quickly in recent years, very few previous bibliometric analysis studies exist that analyze the literature in terms of specific content (trends or options for zeroâenergy buildings) or coverage of different scientific databases. We evaluate the scientific literature published since January 2013 concerning alternative methods for improving the energy efficiency and mitigating climate impacts from buildings. We quantify and describe the literature through a bibliometric approach, comparing the databases Web of Science (WoS) and Scopus. A total of 19,416 (Scopus) and 17,468 (WoS) publications are analyzed, with only 11% common documents. The literature has grown steadily during this time period, with a peak in the year 2017. Most of the publications are in English, in the area of Engineering and Energy Fuels, and from institutions from China and the USA. Strong links are observed between the most published authors and institutions worldwide. An analysis of keywords reveals that most of research focuses on technologies for heating, ventilation, and airâconditioning, phase change materials, as well as information and communication technologies. A significantly smaller segment of the literature takes a broader perspective (greenhouse gas emissions, life cycle, and sustainable development), investigating implementation issues (policies and costs) or renewable energy (solar). Knowledge gaps are detected in the areas of behavioral changes, the circular economy, and some renewable energy sources (geothermal, biomass, small wind). We conclude that i) the contents of WoS and Scopus are radically different in the studied fields; ii) research seems to focus on technological aspects; and iii) there are weak links between research on energy and on climate mitigation and sustainability, the latter themes being misrepresented in the literature. These conclusions should be validated with further analyses of the documents identified in this study. We recommend that future research focuses on filling the above identified gaps, assessing the contents of several scientific databases, and extending energy analyses to their effects in terms of mitigation potentials.This work was funded by the Ministerio de Ciencia, InnovaciĂłn y Universidades de España (RTI2018â
093849âBâC31), by ICREA under the ICREA Academia programme, and by the foundation SIVL
Hybrid Information Retrieval Model For Web Images
The Bing Bang of the Internet in the early 90's increased dramatically the
number of images being distributed and shared over the web. As a result, image
information retrieval systems were developed to index and retrieve image files
spread over the Internet. Most of these systems are keyword-based which search
for images based on their textual metadata; and thus, they are imprecise as it
is vague to describe an image with a human language. Besides, there exist the
content-based image retrieval systems which search for images based on their
visual information. However, content-based type systems are still immature and
not that effective as they suffer from low retrieval recall/precision rate.
This paper proposes a new hybrid image information retrieval model for indexing
and retrieving web images published in HTML documents. The distinguishing mark
of the proposed model is that it is based on both graphical content and textual
metadata. The graphical content is denoted by color features and color
histogram of the image; while textual metadata are denoted by the terms that
surround the image in the HTML document, more particularly, the terms that
appear in the tags p, h1, and h2, in addition to the terms that appear in the
image's alt attribute, filename, and class-label. Moreover, this paper presents
a new term weighting scheme called VTF-IDF short for Variable Term
Frequency-Inverse Document Frequency which unlike traditional schemes, it
exploits the HTML tag structure and assigns an extra bonus weight for terms
that appear within certain particular HTML tags that are correlated to the
semantics of the image. Experiments conducted to evaluate the proposed IR model
showed a high retrieval precision rate that outpaced other current models.Comment: LACSC - Lebanese Association for Computational Sciences,
http://www.lacsc.org/; International Journal of Computer Science & Emerging
Technologies (IJCSET), Vol. 3, No. 1, February 201
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Impliance: A Next Generation Information Management Appliance
ably successful in building a large market and adapting to the changes of the
last three decades, its impact on the broader market of information management
is surprisingly limited. If we were to design an information management system
from scratch, based upon today's requirements and hardware capabilities, would
it look anything like today's database systems?" In this paper, we introduce
Impliance, a next-generation information management system consisting of
hardware and software components integrated to form an easy-to-administer
appliance that can store, retrieve, and analyze all types of structured,
semi-structured, and unstructured information. We first summarize the trends
that will shape information management for the foreseeable future. Those trends
imply three major requirements for Impliance: (1) to be able to store, manage,
and uniformly query all data, not just structured records; (2) to be able to
scale out as the volume of this data grows; and (3) to be simple and robust in
operation. We then describe four key ideas that are uniquely combined in
Impliance to address these requirements, namely the ideas of: (a) integrating
software and off-the-shelf hardware into a generic information appliance; (b)
automatically discovering, organizing, and managing all data - unstructured as
well as structured - in a uniform way; (c) achieving scale-out by exploiting
simple, massive parallel processing, and (d) virtualizing compute and storage
resources to unify, simplify, and streamline the management of Impliance.
Impliance is an ambitious, long-term effort to define simpler, more robust, and
more scalable information systems for tomorrow's enterprises.Comment: This article is published under a Creative Commons License Agreement
(http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute,
display, and perform the work, make derivative works and make commercial use
of the work, but, you must attribute the work to the author and CIDR 2007.
3rd Biennial Conference on Innovative Data Systems Research (CIDR) January
710, 2007, Asilomar, California, US
- âŠ