73,517 research outputs found

    Parsimonious Language Models for a Terabyte of Text

    Get PDF
    The aims of this paper are twofold. Our first aim\ud is to compare results of the earlier Terabyte tracks\ud to the Million Query track. We submitted a number\ud of runs using different document representations\ud (such as full-text, title-fields, or incoming\ud anchor-texts) to increase pool diversity. The initial\ud results show broad agreement in system rankings\ud over various measures on topic sets judged at both\ud Terabyte and Million Query tracks, with runs using\ud the full-text index giving superior results on\ud all measures, but also some noteworthy upsets.\ud Our second aim is to explore the use of parsimonious\ud language models for retrieval on terabyte-scale\ud collections. These models are smaller thus\ud more efficient than the standard language models\ud when used at indexing time, and they may also improve\ud retrieval performance. We have conducted\ud initial experiments using parsimonious models in\ud combination with pseudo-relevance feedback, for\ud both the Terabyte and Million Query track topic\ud sets, and obtained promising initial results

    Retrieving Temperatures and Abundances of Exoplanet Atmospheres with High-Resolution Cross-Correlation Spectroscopy

    Get PDF
    Hi-resolution spectroscopy (R > 25,000) has recently emerged as one of the leading methods to detect atomic and molecular species in the atmospheres of exoplanets. However, it has so far been lacking in a robust method to extract quantitative constraints on temperature structure and molecular/atomic abundances. In this work we present a novel Bayesian atmospheric retrieval framework applicable to high resolution cross-correlation spectroscopy (HRCCS) that relies upon the cross-correlation between data and models to extract the planetary spectral signal. We successfully test the framework on simulated data and show that it can correctly determine Bayesian credibility intervals on atmospheric temperatures and abundances allowing for a quantitative exploration of the inherent degeneracies. Furthermore, our new framework permits us to trivially combine and explore the synergies between HRCCS and low-resolution spectroscopy (LRS) to provide maximal leverage on the information contained within each. This framework also allows us to quantitatively assess the impact of molecular line opacities at high resolution. We apply the framework to VLT CRIRES K-band spectra of HD 209458 b and HD 189733 b and retrieve abundant carbon monoxide but sub-solar abundances for water, largely invariant under different model assumptions. This confirms previous analysis of these datasets, but is possibly at odds with detections of water at different wavelengths and spectral resolutions. The framework presented here is the first step towards a true synergy between space observatories and ground-based hi-resolution observations.Comment: Accepted Version (01/16/19

    Relevance feedback for best match term weighting algorithms in information retrieval

    Get PDF
    Personalisation in full text retrieval or full text filtering implies reweighting of the query terms based on some explicit or implicit feedback from the user. Relevance feedback inputs the user's judgements on previously retrieved documents to construct a personalised query or user profile. This paper studies relevance feedback within two probabilistic models of information retrieval: the first based on statistical language models and the second based on the binary independence probabilistic model. The paper shows the resemblance of the approaches to relevance feedback of these models, introduces new approaches to relevance feedback for both models, and evaluates the new relevance feedback algorithms on the TREC collection. The paper shows that there are no significant differences between simple and sophisticated approaches to relevance feedback

    A probabilistic justification for using tf.idf term weighting in information retrieval

    Get PDF
    This paper presents a new probabilistic model of information retrieval. The most important modeling assumption made is that documents and queries are defined by an ordered sequence of single terms. This assumption is not made in well known existing models of information retrieval, but is essential in the field of statistical natural language processing. Advances already made in statistical natural language processing will be used in this paper to formulate a probabilistic justification for using tf.idf term weighting. The paper shows that the new probabilistic interpretation of tf.idf term weighting might lead to better understanding of statistical ranking mechanisms, for example by explaining how they relate to coordination level ranking. A pilot experiment on the TREC collection shows that the linguistically motivated weighting algorithm outperforms the popular BM25 weighting algorithm

    Maximum Likelihood Associative Memories

    Full text link
    Associative memories are structures that store data in such a way that it can later be retrieved given only a part of its content -- a sort-of error/erasure-resilience property. They are used in applications ranging from caches and memory management in CPUs to database engines. In this work we study associative memories built on the maximum likelihood principle. We derive minimum residual error rates when the data stored comes from a uniform binary source. Second, we determine the minimum amount of memory required to store the same data. Finally, we bound the computational complexity for message retrieval. We then compare these bounds with two existing associative memory architectures: the celebrated Hopfield neural networks and a neural network architecture introduced more recently by Gripon and Berrou

    Retrieval of Leaf Area Index (LAI) and Soil Water Content (WC) Using Hyperspectral Remote Sensing under Controlled Glass House Conditions for Spring Barley and Sugar Beet

    Get PDF
    Leaf area index (LAI) and water content (WC) in the root zone are two major hydro-meteorological parameters that exhibit a dominant control on water, energy and carbon fluxes, and are therefore important for any regional eco-hydrological or climatological study. To investigate the potential for retrieving these parameter from hyperspectral remote sensing, we have investigated plant spectral reflectance (400-2,500 nm, ASD FieldSpec3) for two major agricultural crops (sugar beet and spring barley) in the mid-latitudes, treated under different water and nitrogen (N) conditions in a greenhouse experiment over the growing period of 2008. Along with the spectral response, we have measured soil water content and LAI for 15 intensive measurement campaigns spread over the growing season and could demonstrate a significant response of plant reflectance characteristics to variations in water content and nutrient conditions. Linear and non-linear dimensionality analysis suggests that the full band reflectance information is well represented by the set of 28 vegetation spectral indices (SI) and most of the variance is explained by three to a maximum of eight variables. Investigation of linear dependencies between LAI and soil WC and pre-selected SI's indicate that: (1) linear regression using single SI is not sufficient to describe plant/soil variables over the range of experimental conditions, however, some improvement can be seen knowing crop species beforehand; (2) the improvement is superior when applying multiple linear regression using three explanatory SI's approach. In addition to linear investigations, we applied the non-linear CART (Classification and Regression Trees) technique, which finally did not show the potential for any improvement in the retrieval process

    3D inference and modelling for video retrieval

    Get PDF
    A new scheme is proposed for extracting planar surfaces from 2D image sequences. We firstly perform feature correspondence over two neighboring frames, followed by the estimation of disparity and depth maps, provided a calibrated camera. We then apply iterative Random Sample Consensus (RANSAC) plane fitting to the generated 3D points to find a dominant plane in a maximum likelihood estimation style. Object points on or off this dominant plane are determined by measuring their Euclidean distance to the plane. Experimental work shows that the proposed scheme leads to better plane fitting results than the classical RANSAC method
    corecore