20,428 research outputs found

    Integrating and Ranking Uncertain Scientific Data

    Get PDF
    Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates

    Designing marine fishery reserves using passive acoustic telemetry

    Get PDF
    Marine Fishery Reserves (MFRs) are being adopted, in part, as a strategy to replenish depleted fish stocks and serve as a source for recruits to adjacent fisheries. By necessity, their design must consider the biological parameters of the species under consideration to ensure that the spawning stock is conserved while simultaneously providing propagules for dispersal. We describe how acoustic telemetry can be employed to design effective MFRs by elucidating important life-history parameters of the species under consideration, including home range, and ecological preferences, including habitat utilization. We then designed a reserve based on these parameters using data from two acoustic telemetry studies that examined two closely-linked subpopulations of queen conch (Strombus gigas) at Conch Reef in the Florida Keys. The union of the home ranges of the individual conch (aggregation home range: AgHR) within each subpopulation was used to construct a shape delineating the area within which a conch would be located with a high probability. Together with habitat utilization information acquired during both the spawning and non-spawning seasons, as well as landscape features (i.e., corridors), we designed a 66.5 ha MFR to conserve the conch population. Consideration was also given for further expansion of the population into suitable habitats

    Structure-Aware Sampling: Flexible and Accurate Summarization

    Full text link
    In processing large quantities of data, a fundamental problem is to obtain a summary which supports approximate query answering. Random sampling yields flexible summaries which naturally support subset-sum queries with unbiased estimators and well-understood confidence bounds. Classic sample-based summaries, however, are designed for arbitrary subset queries and are oblivious to the structure in the set of keys. The particular structure, such as hierarchy, order, or product space (multi-dimensional), makes range queries much more relevant for most analysis of the data. Dedicated summarization algorithms for range-sum queries have also been extensively studied. They can outperform existing sampling schemes in terms of accuracy on range queries per summary size. Their accuracy, however, rapidly degrades when, as is often the case, the query spans multiple ranges. They are also less flexible - being targeted for range sum queries alone - and are often quite costly to build and use. In this paper we propose and evaluate variance optimal sampling schemes that are structure-aware. These summaries improve over the accuracy of existing structure-oblivious sampling schemes on range queries while retaining the benefits of sample-based summaries: flexible summaries, with high accuracy on both range queries and arbitrary subset queries

    Record-Linkage from a Technical Point of View

    Get PDF
    TRecord linkage is used for preparing sampling frames, deduplication of lists and combining information on the same object from two different databases. If the identifiers of the same objects in two different databases have error free unique common identifiers like personal identification numbers (PID), record linkage is a simple file merge operation. If the identifiers contains errors, record linkage is a challenging task. In many applications, the files have widely different numbers of observations, for example a few thousand records of a sample survey and a few million records of an administrative database of social security numbers. Available software, privacy issues and future research topics are discussed.Record-Linkage, Data-mining, Privacy preserving protocols

    A Survey on Wireless Sensor Network Security

    Full text link
    Wireless sensor networks (WSNs) have recently attracted a lot of interest in the research community due their wide range of applications. Due to distributed nature of these networks and their deployment in remote areas, these networks are vulnerable to numerous security threats that can adversely affect their proper functioning. This problem is more critical if the network is deployed for some mission-critical applications such as in a tactical battlefield. Random failure of nodes is also very likely in real-life deployment scenarios. Due to resource constraints in the sensor nodes, traditional security mechanisms with large overhead of computation and communication are infeasible in WSNs. Security in sensor networks is, therefore, a particularly challenging task. This paper discusses the current state of the art in security mechanisms for WSNs. Various types of attacks are discussed and their countermeasures presented. A brief discussion on the future direction of research in WSN security is also included.Comment: 24 pages, 4 figures, 2 table

    A revival of integrity constraints for data cleaning

    Get PDF
    Integrity constraints, a.k.a . data dependencies, are being widely used for improving the quality of schema . Recently constraints have enjoyed a revival for improving the quality of data . The tutorial aims to provide an overview of recent advances in constraint-based data cleaning. </jats:p
    • …
    corecore