97,492 research outputs found

    Probabilistic Web Data Management

    Full text link

    A best-effort integration framework for imperfect information spaces

    Get PDF
    Entity resolution (ER) with imperfection management has been accepted as a major aspect while integrating heterogeneous information sources that exhibit entities with varied identifiers, abbreviated names, and multi-valued attributes. Many of novel integration applications such as personal information management and web-scale information management require the ability to represent and manipulate imperfect data. This requirement signifies the issues of starting with imperfect data to the production of probabilistic database. However, classical data integration (CDI) framework fails to cope with such requirement of explicit imperfect information management. This paper introduces an alternative integration framework based on the best-effort perspective to support instance integration automation. The new framework explicitly incorporates probabilistic management to the ER tasks. The probabilistic management includes a new probabilistic global entity, a new pair-wise-source-to-target ER process, and probabilistic decision model logic as alternatives. Together, the paper presents how these processes operate to support the current heterogeneous sources integration challenges

    Neogeography: The Challenge of Channelling Large and Ill-Behaved Data Streams

    Get PDF
    Neogeography is the combination of user generated data and experiences with mapping technologies. In this article we present a research project to extract valuable structured information with a geographic component from unstructured user generated text in wikis, forums, or SMSes. The extracted information should be integrated together to form a collective knowledge about certain domain. This structured information can be used further to help users from the same domain who want to get information using simple question answering system. The project intends to help workers communities in developing countries to share their knowledge, providing a simple and cheap way to contribute and get benefit using the available communication technology

    Energy-Aware Cloud Management through Progressive SLA Specification

    Full text link
    Novel energy-aware cloud management methods dynamically reallocate computation across geographically distributed data centers to leverage regional electricity price and temperature differences. As a result, a managed VM may suffer occasional downtimes. Current cloud providers only offer high availability VMs, without enough flexibility to apply such energy-aware management. In this paper we show how to analyse past traces of dynamic cloud management actions based on electricity prices and temperatures to estimate VM availability and price values. We propose a novel SLA specification approach for offering VMs with different availability and price values guaranteed over multiple SLAs to enable flexible energy-aware cloud management. We determine the optimal number of such SLAs as well as their availability and price guaranteed values. We evaluate our approach in a user SLA selection simulation using Wikipedia and Grid'5000 workloads. The results show higher customer conversion and 39% average energy savings per VM.Comment: 14 pages, conferenc

    Information Retrieval Models

    Get PDF
    Many applications that handle information on the internet would be completely\ud inadequate without the support of information retrieval technology. How would\ud we find information on the world wide web if there were no web search engines?\ud How would we manage our email without spam filtering? Much of the development\ud of information retrieval technology, such as web search engines and spam\ud filters, requires a combination of experimentation and theory. Experimentation\ud and rigorous empirical testing are needed to keep up with increasing volumes of\ud web pages and emails. Furthermore, experimentation and constant adaptation\ud of technology is needed in practice to counteract the effects of people that deliberately\ud try to manipulate the technology, such as email spammers. However,\ud if experimentation is not guided by theory, engineering becomes trial and error.\ud New problems and challenges for information retrieval come up constantly.\ud They cannot possibly be solved by trial and error alone. So, what is the theory\ud of information retrieval?\ud There is not one convincing answer to this question. There are many theories,\ud here called formal models, and each model is helpful for the development of\ud some information retrieval tools, but not so helpful for the development others.\ud In order to understand information retrieval, it is essential to learn about these\ud retrieval models. In this chapter, some of the most important retrieval models\ud are gathered and explained in a tutorial style
    • 

    corecore