31,503 research outputs found

    Integrating and Ranking Uncertain Scientific Data

    Get PDF
    Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates

    Sensitive and Scalable Online Evaluation with Theoretical Guarantees

    Full text link
    Multileaved comparison methods generalize interleaved comparison methods to provide a scalable approach for comparing ranking systems based on regular user interactions. Such methods enable the increasingly rapid research and development of search engines. However, existing multileaved comparison methods that provide reliable outcomes do so by degrading the user experience during evaluation. Conversely, current multileaved comparison methods that maintain the user experience cannot guarantee correctness. Our contribution is two-fold. First, we propose a theoretical framework for systematically comparing multileaved comparison methods using the notions of considerateness, which concerns maintaining the user experience, and fidelity, which concerns reliable correct outcomes. Second, we introduce a novel multileaved comparison method, Pairwise Preference Multileaving (PPM), that performs comparisons based on document-pair preferences, and prove that it is considerate and has fidelity. We show empirically that, compared to previous multileaved comparison methods, PPM is more sensitive to user preferences and scalable with the number of rankers being compared.Comment: CIKM 2017, Proceedings of the 2017 ACM on Conference on Information and Knowledge Managemen

    Beyond English text: Multilingual and multimedia information retrieval.

    Get PDF
    Non

    People on Drugs: Credibility of User Statements in Health Communities

    Full text link
    Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information

    A Statistical Inverse Method for Gridding Passive Microwave Data with Mixed Measurements

    Get PDF
    When a passive microwave footprint intersects objects on the ground with different spectral characteristics, the corresponding observation is mixed. The retrieval of geophysical parameters is limited by this mixture. We propose to partition the study region into objects following an object-based image analysis procedure and then to refine this partition into small cells. Then, we introduce a statistical method to estimate the brightness temperature (TB) of each cell. The method assumes that TB of the cells corresponding to the same object is identically distributed and that the TB heterogeneity within each cell can be neglected. The implementation is based on an iterative expectation-maximization algorithm. We evaluated the proposed method using synthetic images and applied it to grid the TBs of sample AMSR -2 real data over a coastal region in Argentina.Fil: Grimson, Rafael. Universidad Nacional de San Martín; ArgentinaFil: Bali, Juan Lucas. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Rajngewerc, Mariela. Ministerio de Defensa. Instituto de Investigaciones Científicas y Técnicas para la Defensa; ArgentinaFil: Martin, Laura San. Universidad Nacional de San Martín; ArgentinaFil: Salvia, Maria Mercedes. Universidad Nacional de San Martín; Argentina. Consejo Nacional de Investigaciónes Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Astronomía y Física del Espacio. - Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Astronomía y Física del Espacio; Argentin

    Integrating musicological knowledge into a probabilistic framework for chord and key extraction

    Get PDF
    In this contribution a formerly developed probabilistic framework for the simultaneous detection of chords and keys in polyphonic audio is further extended and validated. The system behaviour is controlled by a small set of carefully defined free parameters. This has permitted us to conduct an experimental study which sheds a new light on the importance of musicological knowledge in the context of chord extraction. Some of the obtained results are at least surprising and, to our knowledge, never reported as such before
    corecore