78 research outputs found

    Probability models for information retrieval based on divergence from randomness

    Get PDF
    This thesis devises a novel methodology based on probability theory, suitable for the construction of term-weighting models of Information Retrieval. Our term-weighting functions are created within a general framework made up of three components. Each of the three components is built independently from the others. We obtain the term-weighting functions from the general model in a purely theoretic way instantiating each component with different probability distribution forms. The thesis begins with investigating the nature of the statistical inference involved in Information Retrieval. We explore the estimation problem underlying the process of sampling. De Finetti’s theorem is used to show how to convert the frequentist approach into Bayesian inference and we display and employ the derived estimation techniques in the context of Information Retrieval. We initially pay a great attention to the construction of the basic sample spaces of Information Retrieval. The notion of single or multiple sampling from different populations in the context of Information Retrieval is extensively discussed and used through-out the thesis. The language modelling approach and the standard probabilistic model are studied under the same foundational view and are experimentally compared to the divergence-from-randomness approach. In revisiting the main information retrieval models in the literature, we show that even language modelling approach can be exploited to assign term-frequency normalization to the models of divergence from randomness. We finally introduce a novel framework for the query expansion. This framework is based on the models of divergence-from-randomness and it can be applied to arbitrary models of IR, divergence-based, language modelling and probabilistic models included. We have done a very large number of experiment and results show that the framework generates highly effective Information Retrieval models

    Biochemical polymorphisms in Drosophila populations

    Get PDF

    Robust decision-making with model uncertainty in aerospace systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2008.Includes bibliographical references (p. 161-168).Actual performance of sequential decision-making problems can be extremely sensitive to errors in the models, and this research addressed the role of robustness in coping with this uncertainty. The first part of this thesis presents a computationally efficient sampling methodology, Dirichlet Sigma Points, for solving robust Markov Decision Processes with transition probability uncertainty. A Dirichlet prior is used to model the uncertainty in the transition probabilities. This approach uses the first two moments of the Dirichlet to generates samples of the uncertain probabilities and uses these samples to find the optimal robust policy. The Dirichlet Sigma Point method requires a much smaller number of samples than conventional Monte Carlo approaches, and is empirically demonstrated to be a very good approximation to the robust solution obtained with a very large number of samples. The second part of this thesis discusses the area of robust hybrid estimation. Model uncertainty in hybrid estimation can result in significant covariance mismatches and inefficient estimates. The specific problem of covariance underestimation is addressed, and a new robust estimator is developed that finds the largest covariance admissible within a prescribed uncertainty set. The robust estimator can be found by solving a small convex optimization problem in conjunction with Monte Carlo sampling, and reduces estimation errors in the presence of transition probability uncertainty. The Dirichlet Sigma Points are extended to this problem to reduce the computational requirements of the estimator. In the final part of the thesis, the Dirichlet Sigma Points are extended for real-time adaptation. Using insight from estimation theory, a modified version of the Dirichlet Sigma Points is presented that significantly improves the response time of classical estimators. The thesis is concluded with hardware implementation of these robust and adaptive algorithms on the RAVEN testbed, demonstrating their applicability to real-life UAV missions.by Luca Francesco Bertuccelli.Ph.D

    Chip-Level Thermal Analysis, Modeling, and Optimization Using Multilayer Green's Function.

    Full text link
    With the continual scaling of devices and interconnects, accurate analysis and effective optimization of the temperature distribution of a ULSI chip are increasingly important in predicting and ensuring the performance and reliability of the chip before fabrication. Motivated by the design challenges, this dissertation aims at a detailed study of the areas of thermal analysis, modeling, and optimization of ULSI chips. In particular, this dissertation introduces LOTAGre, a high-efficiency O(n lg n) multilayer Green's function-based thermal analysis method. LOTAGre can analyze ULSI chips consisting of multilayer heterogeneous heat conduction materials, with either wire-bonding packaging or flip-chip packaging, under uniform or non-uniform ambient temperatures. By integrating the eigen-expansion technique and the transmission line theory, this dissertation derives the multilayer heat conduction Green's function, including the s-domain version which can be used to compute the thermal transfer impedance between two arbitrary locations in the chip and establish compact thermal models for the critical components in the chip. To aid interconnect thermal analysis, this dissertation introduces a new Schafft-type interconnect temperature distribution model which is very flexible in addressing the effects of chip packaging, surrounding ambient temperatures, and the temperature gradients within the interconnect. An efficient O(n) method is introduced to solve the interconnect temperature distribution from the model. To optimize the chip temperature distribution, this dissertation introduces an optimal power budget model that determines the optimal allocation of cell powers to different regions of the chip so that the resultant temperature distribution most closely approximates the target temperature distribution for the chip. The generalized minimal residue method and the conjugate gradient method are employed to construct top-level and front-level thermal optimizers to solve the optimal power budget efficiently. Finally, the dissertation describes the procedure to incorporate the optimal power budget model into the widely distributed Capo placement tool to enable thermal optimization in the cell placement stage.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61761/1/baohuaw_1.pd

    Regularity Properties and Pathologies of Position-Space Renormalization-Group Transformations

    Full text link
    We reconsider the conceptual foundations of the renormalization-group (RG) formalism, and prove some rigorous theorems on the regularity properties and possible pathologies of the RG map. Regarding regularity, we show that the RG map, defined on a suitable space of interactions (= formal Hamiltonians), is always single-valued and Lipschitz continuous on its domain of definition. This rules out a recently proposed scenario for the RG description of first-order phase transitions. On the pathological side, we make rigorous some arguments of Griffiths, Pearce and Israel, and prove in several cases that the renormalized measure is not a Gibbs measure for any reasonable interaction. This means that the RG map is ill-defined, and that the conventional RG description of first-order phase transitions is not universally valid. For decimation or Kadanoff transformations applied to the Ising model in dimension d3d \ge 3, these pathologies occur in a full neighborhood {β>β0,h<ϵ(β)}\{ \beta > \beta_0 ,\, |h| < \epsilon(\beta) \} of the low-temperature part of the first-order phase-transition surface. For block-averaging transformations applied to the Ising model in dimension d2d \ge 2, the pathologies occur at low temperatures for arbitrary magnetic-field strength. Pathologies may also occur in the critical region for Ising models in dimension d4d \ge 4. We discuss in detail the distinction between Gibbsian and non-Gibbsian measures, and give a rather complete catalogue of the known examples. Finally, we discuss the heuristic and numerical evidence on RG pathologies in the light of our rigorous theorems.Comment: 273 pages including 14 figures, Postscript, See also ftp.scri.fsu.edu:hep-lat/papers/9210/9210032.ps.

    Estimation in causal graphical models

    Get PDF
    Pearl (2000), Spirtes et al (1993) and Lauritzen (2001) set up a new framework to encode the causal relationships between the random variables by a causal Bayesian network. The estimation of the conditional probabilities in a Bayesian network has received considerable attention by several investigators (e. g., Jordan (1998), Geiger and Heckerman (1997), Ileckerman et al (1995)), but, this issue has not been studied in a causal Bayesian network. In this thesis, we define the multicausal essential graph on the equivalence class of Bayesian networks in which each member of this class manifests a sort of strong type of invariance under (causal) manipulation called hypercausality. We then characterise the families of prior distributions on the parameters of the Bayesian networks which are consistent with hypercausality and show that their unmanipulated uncertain Bayesian networks must demonstrate the independence assumptions. As a result, such prior distributions satisfy a generalisation of the Geiger and lieckerman condition. In particular, when the corresponding essential graph is undirected, the mentioned class of prior distributions will reduce to the Hyper-Dirichlet family (see Chapter 6). In tile second part of this thesis, we will calculate certain local sensitivity measures and through them we are able to provide the solutions for the following questions: Is the network structure that is learned from data robust with respect to changes of the directionality of some specific arrows? Is the local conditional distributions associated with the specified node robust with respect to the changes to its prior distribution or with respect to the changes to the local conditional distribution of another node? Most importantly, is the posterior distribution associated with the parameters of any node robust with respect to the changes to the prior distribution associated with the parameters of one specific node? Finally, are the quantities mentioned above robust with respect to the changes in the independence assumptions described in Chapter 3? Most of the local sensitivity measures (particularly, local measures of the overall posteriors sensitivity), developed in the last decade, tend to diverge to infinity as the sample size becomes very large (Gustafson (1994) and Gustafson et al (1996)). This is in contrast to our knowledge that, starting from different priors, posteriors tend to agree as the data accumulate. Here we define a now class of metrics with more satisfactory asymptotic behaviour. The advantage of the corresponding local sensitivity measures is boundedness for large sample size

    On the geodetic applications of simultaneous range-differencing to LAGEOS

    Get PDF
    The possibility of improving the accuracy of geodetic results by use of simultaneously observed ranges to Lageos, in a differencing mode, from pairs of stations was studied. Simulation tests show that model errors can be effectively minimized by simultaneous range differencing (SRD) for a rather broad class of network satellite pass configurations. The methods of least squares approximation are compared with monomials and Chebyshev polynomials and the cubic spline interpolation. Analysis of three types of orbital biases (radial, along- and across track) shows that radial biases are the ones most efficiently minimized in the SRC mode. The degree to which the other two can be minimized depends on the type of parameters under estimation and the geometry of the problem. Sensitivity analyses of the SRD observation show that for baseline length estimations the most useful data are those collected in a direction parallel to the baseline and at a low elevation. Estimating individual baseline lengths with respect to an assumed but fixed orbit not only decreases the cost, but it further reduces the effects of model biases on the results as opposed to a network solution. Analogous results and conclusions are obtained for the estimates of the coordinates of the pole
    corecore