34,966 research outputs found

    Cleaning uncertain data for top-k queries

    Get PDF
    The information managed in emerging applications, such as sensor networks, location-based services, and data integration, is inherently imprecise. To handle data uncertainty, probabilistic databases have been recently developed. In this paper, we study how to quantify the ambiguity of answers returned by a probabilistic top-k query. We develop efficient algorithms to compute the quality of this query under the possible world semantics. We further address the cleaning of a probabilistic database, in order to improve top-k query quality. Cleaning involves the reduction of ambiguity associated with the database entities. For example, the uncertainty of a temperature value acquired from a sensor can be reduced, or cleaned, by requesting its newest value from the sensor. While this 'cleaning operation' may produce a better query result, it may involve a cost and fail. We investigate the problem of selecting entities to be cleaned under a limited budget. Particularly, we propose an optimal solution and several heuristics. Experiments show that the greedy algorithm is efficient and close to optimal. © 2013 IEEE.published_or_final_versio

    Adaptive kNN using Expected Accuracy for Classification of Geo-Spatial Data

    Full text link
    The k-Nearest Neighbor (kNN) classification approach is conceptually simple - yet widely applied since it often performs well in practical applications. However, using a global constant k does not always provide an optimal solution, e.g., for datasets with an irregular density distribution of data points. This paper proposes an adaptive kNN classifier where k is chosen dynamically for each instance (point) to be classified, such that the expected accuracy of classification is maximized. We define the expected accuracy as the accuracy of a set of structurally similar observations. An arbitrary similarity function can be used to find these observations. We introduce and evaluate different similarity functions. For the evaluation, we use five different classification tasks based on geo-spatial data. Each classification task consists of (tens of) thousands of items. We demonstrate, that the presented expected accuracy measures can be a good estimator for kNN performance, and the proposed adaptive kNN classifier outperforms common kNN and previously introduced adaptive kNN algorithms. Also, we show that the range of considered k can be significantly reduced to speed up the algorithm without negative influence on classification accuracy

    Modelling the species jump: towards assessing the risk of human infection from novel avian influenzas

    Get PDF
    The scientific understanding of the driving factors behind zoonotic and pandemic influenzas is hampered by complex interactions between viruses, animal hosts and humans. This complexity makes identifying influenza viruses of high zoonotic or pandemic risk, before they emerge from animal populations, extremely difficult and uncertain. As a first step towards assessing zoonotic risk of Influenza, we demonstrate a risk assessment framework to assess the relative likelihood of influenza A viruses, circulating in animal populations, making the species jump into humans. The intention is that such a risk assessment framework could assist decisionmakers to compare multiple influenza viruses for zoonotic potential and hence to develop appropriate strain-specific control measures. It also provides a first step towards showing proof of principle for an eventual pandemic risk model. We show that the spatial and temporal epidemiology is as important in assessing the risk of an influenza A species jump as understanding the innate molecular capability of the virus.We also demonstrate data deficiencies that need to be addressed in order to consistently combine both epidemiological and molecular virology data into a risk assessment framework

    Oxidant and precursor trends in the metropolitan Los Angeles region

    Get PDF
    This paper describes recent htstorical trends in oxidant and precursors in the Los Angeles region. Control strategies and basin-wide emission trends for nitrogen oxides and reactive hydrocarbons are documented year by year from 1965 to 1974. Trends in the geographic distribution of emissions are illustrated by computing net percentage emission changes over the decade for individual counties. The changes in emissions are compared to changes in ambient precursor concentrations and oxidant concentrations. It is found that many of the changes in monitored air quality can be explained by trends in total emissions and in the spatial distribution of emissions

    InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services

    Full text link
    Cloud computing providers have setup several data centers at different geographical locations over the Internet in order to optimally serve needs of their customers around the world. However, existing systems do not support mechanisms and policies for dynamically coordinating load distribution among different Cloud-based data centers in order to determine optimal location for hosting application services to achieve reasonable QoS levels. Further, the Cloud computing providers are unable to predict geographic distribution of users consuming their services, hence the load coordination must happen automatically, and distribution of services must change in response to changes in the load. To counter this problem, we advocate creation of federated Cloud computing environment (InterCloud) that facilitates just-in-time, opportunistic, and scalable provisioning of application services, consistently achieving QoS targets under variable workload, resource and network conditions. The overall goal is to create a computing environment that supports dynamic expansion or contraction of capabilities (VMs, services, storage, and database) for handling sudden variations in service demands. This paper presents vision, challenges, and architectural elements of InterCloud for utility-oriented federation of Cloud computing environments. The proposed InterCloud environment supports scaling of applications across multiple vendor clouds. We have validated our approach by conducting a set of rigorous performance evaluation study using the CloudSim toolkit. The results demonstrate that federated Cloud computing model has immense potential as it offers significant performance gains as regards to response time and cost saving under dynamic workload scenarios.Comment: 20 pages, 4 figures, 3 tables, conference pape

    The State of the Art in Cartograms

    Full text link
    Cartograms combine statistical and geographical information in thematic maps, where areas of geographical regions (e.g., countries, states) are scaled in proportion to some statistic (e.g., population, income). Cartograms make it possible to gain insight into patterns and trends in the world around us and have been very popular visualizations for geo-referenced data for over a century. This work surveys cartogram research in visualization, cartography and geometry, covering a broad spectrum of different cartogram types: from the traditional rectangular and table cartograms, to Dorling and diffusion cartograms. A particular focus is the study of the major cartogram dimensions: statistical accuracy, geographical accuracy, and topological accuracy. We review the history of cartograms, describe the algorithms for generating them, and consider task taxonomies. We also review quantitative and qualitative evaluations, and we use these to arrive at design guidelines and research challenges

    An integrated Bayesian model for estimating the long-term health effects of air pollution by fusing modelled and measured pollution data: a case study of nitrogen dioxide concentrations in Scotland

    Get PDF
    The long-term health effects of air pollution can be estimated using a spatio-temporal ecological study, where the disease data are counts of hospital admissions from populations in small areal units at yearly intervals. Spatially representative pollution concentrations for each areal unit are typically estimated by applying Kriging to data from a sparse monitoring network, or by computing averages over grid level concentrations from an atmospheric dispersion model. We propose a novel fusion model for estimating spatially aggregated pollution concentrations using both the modelled and monitored data, and relate these concentrations to respiratory disease in a new study in Scotland between 2007 and 2011

    Towards Semantic Integration of Heterogeneous Sensor Data with Indigenous Knowledge for Drought Forecasting

    Full text link
    In the Internet of Things (IoT) domain, various heterogeneous ubiquitous devices would be able to connect and communicate with each other seamlessly, irrespective of the domain. Semantic representation of data through detailed standardized annotation has shown to improve the integration of the interconnected heterogeneous devices. However, the semantic representation of these heterogeneous data sources for environmental monitoring systems is not yet well supported. To achieve the maximum benefits of IoT for drought forecasting, a dedicated semantic middleware solution is required. This research proposes a middleware that semantically represents and integrates heterogeneous data sources with indigenous knowledge based on a unified ontology for an accurate IoT-based drought early warning system (DEWS).Comment: 5 pages, 3 figures, In Proceedings of the Doctoral Symposium of the 16th International Middleware Conference (Middleware Doct Symposium 2015), Ivan Beschastnikh and Wouter Joosen (Eds.). ACM, New York, NY, US
    • …
    corecore