574 research outputs found

    Non-Compositional Term Dependence for Information Retrieval

    Full text link
    Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they co-occur, the more semantically dependent they are. This assumption is not always correct: the frequency of co-occurring terms can be separate from the strength of their semantic dependence. E.g. "red tape" might be overall less frequent than "tape measure" in some corpus, but this does not mean that "red"+"tape" are less dependent than "tape"+"measure". This is especially the case for non-compositional phrases, i.e. phrases whose meaning cannot be composed from the individual meanings of their terms (such as the phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction between the frequency and strength of term dependence in IR, we present a principled approach for handling term dependence in queries, using both lexical frequency and semantic evidence. We focus on non-compositional phrases, extending a recent unsupervised model for their detection [21] to IR. Our approach, integrated into ranking using Markov Random Fields [31], yields effectiveness gains over competitive TREC baselines, showing that there is still room for improvement in the very well-studied area of term dependence in IR

    Functional Uniform Priors for Nonlinear Modelling

    Full text link
    This paper considers the topic of finding prior distributions when a major component of the statistical model depends on a nonlinear function. Using results on how to construct uniform distributions in general metric spaces, we propose a prior distribution that is uniform in the space of functional shapes of the underlying nonlinear function and then back-transform to obtain a prior distribution for the original model parameters. The primary application considered in this article is nonlinear regression, but the idea might be of interest beyond this case. For nonlinear regression the so constructed priors have the advantage that they are parametrization invariant and do not violate the likelihood principle, as opposed to uniform distributions on the parameters or the Jeffrey's prior, respectively. The utility of the proposed priors is demonstrated in the context of nonlinear regression modelling in clinical dose-finding trials, through a real data example and simulation. In addition the proposed priors are used for calculation of an optimal Bayesian design.Comment: submitted for publicatio

    Through a Glass, Darkly:The CIA and Oral History

    Get PDF
    This article broaches the thorny issue of how we may study the history of the CIA by utilizing oral history interviews. This article argues that while oral history interviews impose particular demands upon the researcher, they are particularly pronounced in relation to studying the history of intelligence services. This article, nevertheless, also argues that while intelligence history and oral history each harbour their own epistemological perils and biases, pitfalls which may in fact be pronounced when they are conjoined, the relationship between them may nevertheless be a productive one. Indeed, each field may enrich the other provided we have thought carefully about the linkages between them: this article's point of departure. The first part of this article outlines some of the problems encountered in studying the CIA by relating them to the author's own work. This involved researching the CIA's role in US foreign policy towards Afghanistan since a landmark year in the history of the late Cold War, 1979 (i.e. the year the Soviet Union invaded that country). The second part of this article then considers some of the issues historians must confront when applying oral history to the study of the CIA. To bring this within the sphere of cognition of the reader the author recounts some of his own experiences interviewing CIA officers in and around Washington DC. The third part then looks at some of the contributions oral history in particular can make towards a better understanding of the history of intelligence services and the CIA

    A blind detection of a large, complex, Sunyaev--Zel'dovich structure

    Get PDF
    We present an interesting Sunyaev-Zel'dovich (SZ) detection in the first of the Arcminute Microkelvin Imager (AMI) 'blind', degree-square fields to have been observed down to our target sensitivity of 100{\mu}Jy/beam. In follow-up deep pointed observations the SZ effect is detected with a maximum peak decrement greater than 8 \times the thermal noise. No corresponding emission is visible in the ROSAT all-sky X-ray survey and no cluster is evident in the Palomar all-sky optical survey. Compared with existing SZ images of distant clusters, the extent is large (\approx 10') and complex; our analysis favours a model containing two clusters rather than a single cluster. Our Bayesian analysis is currently limited to modelling each cluster with an ellipsoidal or spherical beta-model, which do not do justice to this decrement. Fitting an ellipsoid to the deeper candidate we find the following. (a) Assuming that the Evrard et al. (2002) approximation to Press & Schechter (1974) correctly gives the number density of clusters as a function of mass and redshift, then, in the search area, the formal Bayesian probability ratio of the AMI detection of this cluster is 7.9 \times 10^4:1; alternatively assuming Jenkins et al. (2001) as the true prior, the formal Bayesian probability ratio of detection is 2.1 \times 10^5:1. (b) The cluster mass is MT,200 = 5.5+1.2\times 10^14h-1M\odot. (c) Abandoning a physical model with num- -1.3 70 ber density prior and instead simply modelling the SZ decrement using a phenomenological {\beta}-model of temperature decrement as a function of angular distance, we find a central SZ temperature decrement of -295+36 {\mu}K - this allows for CMB primary anisotropies, receiver -15 noise and radio sources. We are unsure if the cluster system we observe is a merging system or two separate clusters.Comment: accepted MNRAS. 12 pages, 9 figure

    Statistically robust representation and comparison of mortality profiles in archaeozoology

    Get PDF
    Archaeozoological mortality profiles have been used to infer site-specific subsistence strategies. There is however no common agreement on the best way to present these profiles and confidence intervals around age class proportions. In order to deal with these issues, we propose the use of the Dirichlet distribution and present a new approach to perform age-at-death multivariate graphical comparisons. We demonstrate the efficiency of this approach using domestic sheep/goat dental remains from 10 Cardial sites (Early Neolithic) located in South France and the Iberian Peninsula. We show that the Dirichlet distribution in age-at-death analysis can be used: (i) to generate Bayesian credible intervals around each age class of a mortality profile, even when not all age classes are observed; and (ii) to create 95% kernel density contours around each age-at-death frequency distribution when multiple sites are compared using correspondence analysis. The statistical procedure we present is applicable to the analysis of any categorical count data and particularly well-suited to archaeological data (e.g. potsherds, arrow heads) where sample sizes are typically small

    A mitotic recombination map proximal to the APC locus on chromosome 5q and assessment of influences on colorectal cancer risk

    Get PDF
    Mitotic recombination is important for inactivating tumour suppressor genes by copy-neutral loss of heterozygosity (LOH). Although meiotic recombination maps are plentiful, little is known about mitotic recombination. The APC gene (chr5q21) is mutated in most colorectal tumours and its usual mode of LOH is mitotic recombination.

    Measurement of Dijet Angular Distributions and Search for Quark Compositeness

    Get PDF
    We have measured the dijet angular distribution in s\sqrt{s}=1.8 TeV ppˉp\bar{p} collisions using the D0 detector. Order αs3\alpha^{3}_{s} QCD predictions are in good agreement with the data. At 95% confidence the data exclude models of quark compositeness in which the contact interaction scale is below 2 TeV.Comment: 11 pages, Latex, 3 postscript figure

    High performance computation of landscape genomic models including local indicators of spatial association

    Get PDF
    With the increasing availability of both molecular and topo-climatic data, the main challenges facing landscape genomics – that is the combination of landscape ecology with population genomics – include processing large numbers of models and distinguishing between selection and demographic processes (e.g. population structure). Several methods address the latter, either by estimating a null model of population history or by simultaneously inferring environmental and demographic effects. Here we present samβada, an approach designed to study signatures of local adaptation, with special emphasis on high performance computing of large-scale genetic and environmental data sets. samβada identifies candidate loci using genotype–environment associations while also incorporating multivariate analyses to assess the effect of many environmental predictor variables. This enables the inclusion of explanatory variables representing population structure into the models to lower the occurrences of spurious genotype–environment associations. In addition, samβada calculates local indicators of spatial association for candidate loci to provide information on whether similar genotypes tend to cluster in space, which constitutes a useful indication of the possible kinship between individuals. To test the usefulness of this approach, we carried out a simulation study and analysed a data set from Ugandan cattle to detect signatures of local adaptation with samβada, bayenv, lfmm and an FST outlier method (FDIST approach in arlequin) and compare their results. samβada – an open source software for Windows, Linux and Mac OS X available at http://lasig.epfl.ch/sambada – outperforms other approaches and better suits whole-genome sequence data processing

    Bayesian search for low-mass planets around nearby M dwarfs. Estimates for occurrence rate based on global detectability statistics

    Get PDF
    Mikko Tuomi, 'Bayesian search for low-mass planets around nearby M dwarfs - estimates for occurrence rate based on global detectability statistics', Monthly Notices of the Royal Astronomical Society, Vol. 441 (2): 1545-1569, first published online 8 May 2014. The version of record is available online at doi: 10.1093/mnras/stu358 © 2014 The Authors. Published by Oxford University Press on behalf of the Royal Astronomical Society.Due to their higher planet-star mass ratios, M dwarfs are the easiest targets for detection of low-mass planets orbiting nearby stars using Doppler spectroscopy. Furthermore, because of their low masses and luminosities, Doppler measurements enable the detection of lowmass planets in their habitable zones that correspond to closer orbits than for solar-type stars. We re-analyse literature Ultraviolet and Visual Echelle Spectrograph (UVES) radial velocities of 41 nearby Mdwarfs in a combination with new velocities obtained from publicly available spectra from the HARPS-ESO spectrograph of these stars in an attempt to constrain any low-amplitude Keplerian signals. We apply Bayesian signal detection criteria, together with posterior sampling techniques, in combination with noise models that take into account correlations in the data and obtain estimates for the number of planet candidates in the sample. More generally, we use the estimated detection probability function to calculate the occurrence rate of low-mass planets around nearby M dwarfs. We report eight new planet candidates in the sample (orbiting GJ 27.1, GJ 160.2, GJ 180, GJ 229, GJ 422, and GJ 682), including two new multiplanet systems, and confirm two previously known candidates in the GJ 433 system based on detections of Keplerian signals in the combined UVES and High Accuracy Radial velocity Planet Searcher (HARPS) radial velocity data that cannot be explained by periodic and/or quasi-periodic phenomena related to stellar activities. Finally, we use the estimated detection probability function to calculate the occurrence rate of low-mass planets around nearby M dwarfs. According to our results, M dwarfs are hosts to an abundance of low-mass planets and the occurrence rate of planets less massive than 10M? is of the order of one planet per star, possibly even greater. Our results also indicate that planets with masses between 3 and 10 M⊕ are common in the stellar habitable zones of M dwarfs with an estimated occurrence rate of 0.21+0.03 -0.05 planets per star.Peer reviewe
    corecore