346,946 research outputs found
Towards Semantic Modeling of Contradictions and Disagreements: A Case Study of Medical Guidelines
We introduce a formal distinction between contradictions and disagreements in
natural language texts, motivated by the need to formally reason about
contradictory medical guidelines. This is a novel and potentially very useful
distinction, and has not been discussed so far in NLP and logic. We also
describe a NLP system capable of automated finding contradictory medical
guidelines; the system uses a combination of text analysis and information
retrieval modules. We also report positive evaluation results on a small corpus
of contradictory medical recommendations.Comment: 5 pages, 1 figure, accepted at 12th International Conference on
Computational Semantics (IWCS-2017
Development and evaluation of a geographic information retrieval system using fine grained toponyms
Geographic information retrieval (GIR) is concerned with returning information in response to an information need, typically expressed in terms of a thematic and spatial component linked by a spatial relationship. However, evaluation initiatives have often failed to show significant differences between simple text baselines and more complex spatially enabled GIR approaches. We explore the effectiveness of three systems (a text baseline, spatial query expansion, and a full GIR system utilizing both text and spatial indexes) at retrieving documents from a corpus describing mountaineering expeditions, centred around fine grained toponyms. To allow evaluation, we use user generated content (UGC) in the form of metadata associated with individual articles to build a test collection of queries and judgments. The test collection allowed us to demonstrate that a GIR-based method significantly outperformed a text baseline for all but very specific queries associated with very small query radii. We argue that such approaches to test collection development have much to offer in the evaluation of GIR
Incremental Test Collections
Corpora and topics are readily available for information retrieval research. Relevance judgments, which are necessary for system evaluation, are expensive; the cost of obtaining them prohibits in-house evaluation of retrieval systems on new corpora or new topics. We present an algorithm for cheaply constructing sets of relevance judgments. Our method intelligently selects documents to be judged and decides when to stop in such a way that with very little work there can be a high degree of condence in the result of the evaluation. We demonstrate the algorithm\u27s eectiveness by showing that it produces small sets of relevance judgments that reliably discriminate between two systems. The algorithm can be used to incrementally design retrieval systems by simultaneously comparing sets of systems. The number of additional judgments needed after each incremental design change decreases at a rate reciprocal to the number of systems being compared. To demonstrate the eectiveness of our method, we evaluate TREC ad hoc submissions, showing that with 95% fewer relevance judgments we can reach a Kendall\u27s tau rank correlation of at least 0.9
The development of XML document retrieval using query weighting / Fauziah Mat Saat
Query is one of the processes involved in information retrieval in order to
obtain the retrieved documents. Query is importance in information retrieval
because it will affect the retrieved documents. The poor fomiulated queries will
lead to the poor document retrieval such as the retrieved document might be too
large or too small and no provision for ranking documents. There are many
techniques that we can use to process the user query and these different kinds
of techniques will give a different result for documents retrieval. One of the
techniques is using query weighting algorithm which is either the system will
calculate the weight of the importance words or the user itself will give a weight
to the importance words. This research is focus on developing an XML
documents retrieval using a query weighting algorithm in order to retrieve
FTMSK official letter. This technique has been successfully tested and it proves
that query weighting technique is very effective in XML document retrieval.
Keywords: Query Weighting, XML Documents Retrieval, Query
Hyperspectral Remote Sensing of Atmosphere and Surface Properties
Atmospheric Infrared Sounder (AIRS), Infrared Atmospheric Sounding Interferometer (IASI), and Cross-track Infrared Sounder (CrIS) are all hyper-spectral satellite sensors with thousands of spectral channels. Top of atmospheric radiance spectra measured by these sensors contain high information content on atmospheric, cloud, and surface properties. Exploring high information content contained in these high spectral resolution spectra is a challenging task due to computation e ort involved in modeling thousands of spectral channels. Usually, only very small fractions (4{10 percent) of the available channels are included in physical retrieval systems or numerical weather forecast (NWP) satellite data assimilations. We will describe a method of simultaneously retrieving atmospheric temperature, moisture, cloud, and surface properties using all available spectral channels without sacrificing computational speed. The essence of the method is to convert channel radiance spectra into super-channels by an Empirical Orthogonal Function (EOF) transformation. Because the EOFs are orthogonal to each other, about 100 super-channels are adequate to capture the information content of the radiance spectra. A Principal Component-based Radiative Transfer Model (PCRTM) developed at NASA Langley Research Center is used to calculate both the super-channel magnitudes and derivatives with respect to atmospheric profiles and other properties. There is no need to perform EOF transformations to convert super channels back to spectral space at each iteration step for a one-dimensional variational retrieval or a NWP data assimilation system. The PCRTM forward model is also capable of calculating radiative contributions due to multiple-layer clouds. The multiple scattering effects of the clouds are efficiently parameterized. A physical retrieval algorithm then performs an inversion of atmospheric, cloud, and surface properties in super channel domain directly therefore both reducing the computational need and preserving the information content of the IASI measurements. The inversion algorithm is based on a non-linear Levenberg-Marquardt method with climatology covariance matrices and a priori information as constraints. One advantage of this approach is that it uses all information content from the hyper-spectral data so that the retrieval is less sensitive to instrument noise and eliminates the need for selecting a sub-set of the channels
Phase Retrieval for Partially Coherent Observations
Phase retrieval is in general a non-convex and non-linear task and the
corresponding algorithms struggle with the issue of local minima. We consider
the case where the measurement samples within typically very small and
disconnected subsets are coherently linked to each other - which is a
reasonable assumption for our objective of antenna measurements. Two classes of
measurement setups are discussed which can provide this kind of extra
information: multi-probe systems and holographic measurements with multiple
reference signals. We propose several formulations of the corresponding phase
retrieval problem. The simplest of these formulations poses a linear system of
equations similar to an eigenvalue problem where a unique non-trivial
null-space vector needs to be found. Accurate phase reconstruction for
partially coherent observations is, thus, possible by a reliable solution
process and with judgment of the solution quality. Under ideal, noise-free
conditions, the required sampling density is less than two times the number of
unknowns. Noise and other observation errors increase this value slightly.
Simulations for Gaussian random matrices and for antenna measurement scenarios
demonstrate that reliable phase reconstruction is possible with the presented
approach.Comment: 12 pages, 14 figure
Data Processing and the Envision
Data is being generated very rapidly due to increase in information in everyday life. Huge amount of data gets accumulated from various organizations that is difficult to analyze and exploit. Data created by an expanding number of sensors in the environment such as traffic cameras and satellites, internet activities on social networking sites, healthcare database, government database, sales data etc., are example of huge data. Processing, analyzing and communicating this data are a challenge. Online shopping websites get flooded with voluminous amount of sales data every day. Analyzing and visualizing this data for information retrieval is a difficult task. There are large number of information visualization techniques which have been developed over the last decade to support the exploration of large data sets. With today’s data management systems, it is only possible to view quite small portions of the data. If the data is presented textually, the amount of data which can be displayed is in the range of some 100 data items, but this is like a drop in the ocean when dealing with data sets containing millions of data items. Data is being generated very rapidly due to increase in information in everyday life. Huge amount of data gets accumulated from various organizations that is difficult to analyze and exploit. Data created by an expanding number of sensors in the environment such as traffic cameras and satellites, internet activities on social networking sites, healthcare database, government database, sales data etc., are example of huge data. Processing, analyzing and communicating this data are a challenge. Online shopping websites get flooded with voluminous amount of sales data every day. Analyzing and visualizing this data for information retrieval is a difficult task. Therefore, a system is required which will effectively analyze and visualize data. This paper focuses on a system which will visualize sales data which will help users in applying intelligence in business, revenue generation, and decision making, managing business operation and tracking progress of tasks.
Effective and efficient data visualization is the key part of the discovery process. It is the intermediate between the human intuition and quantitative context of the data, thus an essential component of the scientific path from data into knowledge and understanding. Therefore, a system is required which will effectively analyze and visualize data. This paper focuses on a system which will visualize data which will help users in interactive data visualization applying in business, revenue generation, and decision making, managing business operation and tracking progress of tasks
Distributed Information Retrieval using Keyword Auctions
This report motivates the need for large-scale distributed approaches to information retrieval, and proposes solutions based on keyword auctions
Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors
Statistical significance testing is widely accepted as a means to assess how
well a difference in effectiveness reflects an actual difference between
systems, as opposed to random noise because of the selection of topics.
According to recent surveys on SIGIR, CIKM, ECIR and TOIS papers, the t-test is
the most popular choice among IR researchers. However, previous work has
suggested computer intensive tests like the bootstrap or the permutation test,
based mainly on theoretical arguments. On empirical grounds, others have
suggested non-parametric alternatives such as the Wilcoxon test. Indeed, the
question of which tests we should use has accompanied IR and related fields for
decades now. Previous theoretical studies on this matter were limited in that
we know that test assumptions are not met in IR experiments, and empirical
studies were limited in that we do not have the necessary control over the null
hypotheses to compute actual Type I and Type II error rates under realistic
conditions. Therefore, not only is it unclear which test to use, but also how
much trust we should put in them. In contrast to past studies, in this paper we
employ a recent simulation methodology from TREC data to go around these
limitations. Our study comprises over 500 million p-values computed for a range
of tests, systems, effectiveness measures, topic set sizes and effect sizes,
and for both the 2-tail and 1-tail cases. Having such a large supply of IR
evaluation data with full knowledge of the null hypotheses, we are finally in a
position to evaluate how well statistical significance tests really behave with
IR data, and make sound recommendations for practitioners.Comment: 10 pages, 6 figures, SIGIR 201
- …