346,946 research outputs found

    Towards Semantic Modeling of Contradictions and Disagreements: A Case Study of Medical Guidelines

    Full text link
    We introduce a formal distinction between contradictions and disagreements in natural language texts, motivated by the need to formally reason about contradictory medical guidelines. This is a novel and potentially very useful distinction, and has not been discussed so far in NLP and logic. We also describe a NLP system capable of automated finding contradictory medical guidelines; the system uses a combination of text analysis and information retrieval modules. We also report positive evaluation results on a small corpus of contradictory medical recommendations.Comment: 5 pages, 1 figure, accepted at 12th International Conference on Computational Semantics (IWCS-2017

    Development and evaluation of a geographic information retrieval system using fine grained toponyms

    Get PDF
    Geographic information retrieval (GIR) is concerned with returning information in response to an information need, typically expressed in terms of a thematic and spatial component linked by a spatial relationship. However, evaluation initiatives have often failed to show significant differences between simple text baselines and more complex spatially enabled GIR approaches. We explore the effectiveness of three systems (a text baseline, spatial query expansion, and a full GIR system utilizing both text and spatial indexes) at retrieving documents from a corpus describing mountaineering expeditions, centred around fine grained toponyms. To allow evaluation, we use user generated content (UGC) in the form of metadata associated with individual articles to build a test collection of queries and judgments. The test collection allowed us to demonstrate that a GIR-based method significantly outperformed a text baseline for all but very specific queries associated with very small query radii. We argue that such approaches to test collection development have much to offer in the evaluation of GIR

    Incremental Test Collections

    Get PDF
    Corpora and topics are readily available for information retrieval research. Relevance judgments, which are necessary for system evaluation, are expensive; the cost of obtaining them prohibits in-house evaluation of retrieval systems on new corpora or new topics. We present an algorithm for cheaply constructing sets of relevance judgments. Our method intelligently selects documents to be judged and decides when to stop in such a way that with very little work there can be a high degree of condence in the result of the evaluation. We demonstrate the algorithm\u27s eectiveness by showing that it produces small sets of relevance judgments that reliably discriminate between two systems. The algorithm can be used to incrementally design retrieval systems by simultaneously comparing sets of systems. The number of additional judgments needed after each incremental design change decreases at a rate reciprocal to the number of systems being compared. To demonstrate the eectiveness of our method, we evaluate TREC ad hoc submissions, showing that with 95% fewer relevance judgments we can reach a Kendall\u27s tau rank correlation of at least 0.9

    The development of XML document retrieval using query weighting / Fauziah Mat Saat

    Get PDF
    Query is one of the processes involved in information retrieval in order to obtain the retrieved documents. Query is importance in information retrieval because it will affect the retrieved documents. The poor fomiulated queries will lead to the poor document retrieval such as the retrieved document might be too large or too small and no provision for ranking documents. There are many techniques that we can use to process the user query and these different kinds of techniques will give a different result for documents retrieval. One of the techniques is using query weighting algorithm which is either the system will calculate the weight of the importance words or the user itself will give a weight to the importance words. This research is focus on developing an XML documents retrieval using a query weighting algorithm in order to retrieve FTMSK official letter. This technique has been successfully tested and it proves that query weighting technique is very effective in XML document retrieval. Keywords: Query Weighting, XML Documents Retrieval, Query

    Hyperspectral Remote Sensing of Atmosphere and Surface Properties

    Get PDF
    Atmospheric Infrared Sounder (AIRS), Infrared Atmospheric Sounding Interferometer (IASI), and Cross-track Infrared Sounder (CrIS) are all hyper-spectral satellite sensors with thousands of spectral channels. Top of atmospheric radiance spectra measured by these sensors contain high information content on atmospheric, cloud, and surface properties. Exploring high information content contained in these high spectral resolution spectra is a challenging task due to computation e ort involved in modeling thousands of spectral channels. Usually, only very small fractions (4{10 percent) of the available channels are included in physical retrieval systems or numerical weather forecast (NWP) satellite data assimilations. We will describe a method of simultaneously retrieving atmospheric temperature, moisture, cloud, and surface properties using all available spectral channels without sacrificing computational speed. The essence of the method is to convert channel radiance spectra into super-channels by an Empirical Orthogonal Function (EOF) transformation. Because the EOFs are orthogonal to each other, about 100 super-channels are adequate to capture the information content of the radiance spectra. A Principal Component-based Radiative Transfer Model (PCRTM) developed at NASA Langley Research Center is used to calculate both the super-channel magnitudes and derivatives with respect to atmospheric profiles and other properties. There is no need to perform EOF transformations to convert super channels back to spectral space at each iteration step for a one-dimensional variational retrieval or a NWP data assimilation system. The PCRTM forward model is also capable of calculating radiative contributions due to multiple-layer clouds. The multiple scattering effects of the clouds are efficiently parameterized. A physical retrieval algorithm then performs an inversion of atmospheric, cloud, and surface properties in super channel domain directly therefore both reducing the computational need and preserving the information content of the IASI measurements. The inversion algorithm is based on a non-linear Levenberg-Marquardt method with climatology covariance matrices and a priori information as constraints. One advantage of this approach is that it uses all information content from the hyper-spectral data so that the retrieval is less sensitive to instrument noise and eliminates the need for selecting a sub-set of the channels

    Phase Retrieval for Partially Coherent Observations

    Full text link
    Phase retrieval is in general a non-convex and non-linear task and the corresponding algorithms struggle with the issue of local minima. We consider the case where the measurement samples within typically very small and disconnected subsets are coherently linked to each other - which is a reasonable assumption for our objective of antenna measurements. Two classes of measurement setups are discussed which can provide this kind of extra information: multi-probe systems and holographic measurements with multiple reference signals. We propose several formulations of the corresponding phase retrieval problem. The simplest of these formulations poses a linear system of equations similar to an eigenvalue problem where a unique non-trivial null-space vector needs to be found. Accurate phase reconstruction for partially coherent observations is, thus, possible by a reliable solution process and with judgment of the solution quality. Under ideal, noise-free conditions, the required sampling density is less than two times the number of unknowns. Noise and other observation errors increase this value slightly. Simulations for Gaussian random matrices and for antenna measurement scenarios demonstrate that reliable phase reconstruction is possible with the presented approach.Comment: 12 pages, 14 figure

    Data Processing and the Envision

    Get PDF
    Data is being generated very rapidly due to increase in information in everyday life. Huge amount of data gets accumulated from various organizations that is difficult to analyze and exploit. Data created by an expanding number of sensors in the environment such as traffic cameras and satellites, internet activities on social networking sites, healthcare database, government database, sales data etc., are example of huge data. Processing, analyzing and communicating this data are a challenge. Online shopping websites get flooded with voluminous amount of sales data every day. Analyzing and visualizing this data for information retrieval is a difficult task. There are large number of information visualization techniques which have been developed over the last decade to support the exploration of large data sets. With today’s data management systems, it is only possible to view quite small portions of the data. If the data is presented textually, the amount of data which can be displayed is in the range of some 100 data items, but this is like a drop in the ocean when dealing with data sets containing millions of data items. Data is being generated very rapidly due to increase in information in everyday life. Huge amount of data gets accumulated from various organizations that is difficult to analyze and exploit. Data created by an expanding number of sensors in the environment such as traffic cameras and satellites, internet activities on social networking sites, healthcare database, government database, sales data etc., are example of huge data. Processing, analyzing and communicating this data are a challenge. Online shopping websites get flooded with voluminous amount of sales data every day. Analyzing and visualizing this data for information retrieval is a difficult task. Therefore, a system is required which will effectively analyze and visualize data. This paper focuses on a system which will visualize sales data which will help users in applying intelligence in business, revenue generation, and decision making, managing business operation and tracking progress of tasks. Effective and efficient data visualization is the key part of the discovery process. It is the intermediate between the human intuition and quantitative context of the data, thus an essential component of the scientific path from data into knowledge and understanding. Therefore, a system is required which will effectively analyze and visualize data. This paper focuses on a system which will visualize data which will help users in interactive data visualization applying in business, revenue generation, and decision making, managing business operation and tracking progress of tasks

    Distributed Information Retrieval using Keyword Auctions

    Get PDF
    This report motivates the need for large-scale distributed approaches to information retrieval, and proposes solutions based on keyword auctions

    Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors

    Full text link
    Statistical significance testing is widely accepted as a means to assess how well a difference in effectiveness reflects an actual difference between systems, as opposed to random noise because of the selection of topics. According to recent surveys on SIGIR, CIKM, ECIR and TOIS papers, the t-test is the most popular choice among IR researchers. However, previous work has suggested computer intensive tests like the bootstrap or the permutation test, based mainly on theoretical arguments. On empirical grounds, others have suggested non-parametric alternatives such as the Wilcoxon test. Indeed, the question of which tests we should use has accompanied IR and related fields for decades now. Previous theoretical studies on this matter were limited in that we know that test assumptions are not met in IR experiments, and empirical studies were limited in that we do not have the necessary control over the null hypotheses to compute actual Type I and Type II error rates under realistic conditions. Therefore, not only is it unclear which test to use, but also how much trust we should put in them. In contrast to past studies, in this paper we employ a recent simulation methodology from TREC data to go around these limitations. Our study comprises over 500 million p-values computed for a range of tests, systems, effectiveness measures, topic set sizes and effect sizes, and for both the 2-tail and 1-tail cases. Having such a large supply of IR evaluation data with full knowledge of the null hypotheses, we are finally in a position to evaluate how well statistical significance tests really behave with IR data, and make sound recommendations for practitioners.Comment: 10 pages, 6 figures, SIGIR 201
    • …
    corecore