29 research outputs found

    Statistical Analysis of Complex Data: Bayesian Model Selection and Functional Data Depth.

    Full text link
    Big data of the modern era exhibit different types of complex structures. This dissertation addresses two important problems that arise in this context. Consider high-dimensional data where the number of variables is much larger than the sample size. For model selection in a Bayesian framework, a novel approach using sample size dependent spike and slab priors is proposed. It is shown that the corresponding posterior has strong variable selection consistency even when the number of covariates grows nearly exponentially with the sample size, and that the posterior induces shrinkage similar to the shrinkage due to the L0 penalty. A new computational algorithm for posterior computation is proposed, which is much more scalable in memory and in computational efficiency than existing Markov chain Monte Carlo algorithms. For the analysis of functional data, a new notion of data depth is devised which possesses desirable properties, and is especially well suited for obtaining central regions. In particular, the central regions achieve desired simultaneous coverage probability and are useful in a wide range of applications including boxplots and outlier detection for functional data, and simultaneous confidence bands in regression problems.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120686/1/naveennn_1.pd

    Separability as a modeling paradigm in large probabilistic models

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 185-191).Many interesting stochastic models can be formulated as finite-state vector Markov processes, with a state characterized by the values of a collection of random variables. In general, such models suffer from the curse of dimensionality: the size of the state space grows exponentially with the number of underlying random variables, thereby precluding conventional modeling and analysis. A potential cure to this curse is to work with models that allow the propagation of partial information, e.g. marginal distributions, expectations, higher-moments, or cross-correlations, as derived from the joint distribution for the network state. This thesis develops and rigorously investigates the notion of separability, associated with structure in probabilistic models that permits exact propagation of partial information. We show that when partial information can be propagated exactly, it can be done so linearly. The matrices for propagating such partial information share many valuable spectral relationships with the underlying transition matrix of the Markov chain. Separability can be understood from the perspective of subspace invariance in linear systems, though it relates to invariance in a non-standard way. We analyze the asymptotic generality-- as the number of random variables becomes large-of some special cases of separability that permit the propagation of marginal distributions. Within this discussion of separability, we introduce the generalized influence model, which incorporates as special cases two prominent models permitting the propagation of marginal distributions: the influence model and Markov chains on permutations (the symmetric group). The thesis proposes a potentially tractable solution to learning informative model parameters, and illustrates many advantageous properties of the estimator under the assumption of separability. Lastly, we illustrate separability in the general setting without any notion of time-homogeneity, and discuss potential benefits for inference in special cases.by William J. Richoux.Ph.D

    Optimization of point grids in regional satellite gravity analysis using a Bayesian approach

    Get PDF
    The subject of this thesis is the global and regional gravity field determination from GOCE data using the short arc approach. The focus is on the extension of the regional method regarding an adaption of the model resolution to the data by estimating an optimal nodal point configuration for the arrangement of the radial basis functions. Estimating the positions of the basis functions is a nonlinear problem, which is not easy to solve with the means of classical adjustment theory. This is especially true if the number of basis functions is to be determined from the data as well. It is for this reason that the point grid has been fixed so far, and only the linear problem, that is the determination of the scaling coefficients for a given point grid, has been solved. Here, the problem is formulated within the framework of Bayesian statistics by specifying a joint posterior density for the number of the basis functions and the rest of the parameters. For the practical solution, the reversible jump Markov chain Monte Carlo sampling algorithm is employed, which allows simulating this kind of variable dimension problem. Key points in the implementation of the approach are the marginalization of the scaling coefficients from the target density, which enables me to limit the chain to the sampling of the point grid, and the use of a proposal distribution derived from a gravity field model. The final gravity field solution is taken to be the average of the generated gravity field solutions and thus takes into account the uncertainty about the choice of the model. The method is applied to real GOCE data and compared with the global spherical harmonic model ITG-Goce02 and a regional solution that makes use of a regular distribution of basis functions. Being a part of this work, the comparison models are based on the same processing strategy. It turns out that the optimization of the point grid enormously reduces the required number of basis functions, and that the distribution of the grid points becomes adapted to the structures of the gravity field signal. The solution becomes more stable and better reflects the characteristics of the signal. This entails an improvement of up to 13% over the mentioned comparison models.Optimierung von Punktgittern in der regionalen Schwerefeldanalyse unter Verwendung eines Bayesschen Ansatzes Thema dieser Arbeit ist die globale und regionale Schwerefeldbestimmung aus GOCE Daten durch die Analyse kurzer Bahnbögen. Der Schwerpunkt liegt dabei auf der Weiterentwicklung der regionalen Methode hinsichtlich der Anpassung der Modellauflösung an die Daten durch Schätzung einer optimalen Punktkonfiguration für die Anordnung der radialen Basisfunktionen. Die Schätzung der Positionen der Basisfunktionen ist ein nicht-lineares Problem und mit den Mitteln der klassischen Ausgleichungsrechnung nicht einfach zu lösen. Dies gilt insbesondere dann, wenn auch die Anzahl an Basisfunktionen aus den Daten zu bestimmen ist. Aus diesem Grund wurde das Punktgitter bislang fixiert und nur das lineare Problem, die Bestimmung der Skalierungskoeffizienten bei gegebenem Punktgitter, gelöst. Hier wird die Aufgabe im Rahmen der Bayes Statistik formuliert und eine gemeinsame a posteriori Dichte für die Zahl der Basisfunktionen und die übrigen Parameter angesetzt. Die Lösung erfolgt über den reversible jump Markov chain Monte Carlo Sampling Algorithmus, der es erlaubt, Probleme dieser Art von variabler Dimension zu simulieren. Besonderheiten bei der Umsetzung des Verfahrens sind die Marginalisierung der Skalierungskoeffizienten aus der Zieldichte, die es ermöglicht, sich auf das Sampling des Punktgitters zu beschränken, und die Verwendung einer Vorschlagsverteilung abgeleitet aus einem Schwerefeldmodell. Die finale Schwerefeldlösung wird durch Mittelbildung aus den generierten Schwerefeldlösungen abgeleitet und berücksichtigt somit die Unsicherheit über die Wahl des Modells. Die Methode wird auf GOCE Echtdaten angewendet und mit dem globalen Kugelfunktionsmodell ITG-Goce02 und einer regionalen Lösung basierend auf einer gleichmäßigen Punktverteilung verglichen. Die Vergleichsmodelle sind Teil dieser Arbeit und verwenden dieselbe Prozessierungsstrategie. Es zeigt sich, dass die Optimierung des Punktgitters die Zahl der benötigten Basisfunktionen enorm reduziert und die Verteilung der Punkte sich an die Strukturen des Schwerefeldsignals anpasst. Die Lösung ist stabiler und spiegelt die Charakteristiken des Signals besser wider. Damit einher geht eine Verbesserung von bis zu 13% gegenüber den genannten Vergleichsmodellen

    Advances in Evaporation and Evaporative Demand

    Get PDF
    The importance of evapotranspiration is well-established in different disciplines such as hydrology, agronomy, climatology, and other geosciences. Reliable estimates of evapotranspiration are also vital to develop criteria for in-season irrigation management, water resource allocation, long-term estimates of water supply, demand and use, design and management of water resources infrastructure, and evaluation of the effect of land use and management changes on the water balance. The objective of this Special Issue is to define and discuss several ET terms, including potential, reference, and actual (crop) ET, and present a wide spectrum of innovative research papers and case studies

    Spatio-temporal rainfall estimation and nowcasting for flash flood forecasting.

    Get PDF
    Thesis (Ph.D.Eng.)-University of KwaZulu-Natal, Durban, 2007.Floods cannot be prevented, but their devastating effects can be minimized if advance warning of the event is available. The South African Disaster Management Act (Act 57 of 2002) advocates a paradigm shift from the current "bucket and blanket brigade" response-based mind set to one where disaster prevention or mitigation are the preferred options. It is in the context of mitigating the effects of floods that the development and implementation of a reliable flood forecasting system has major significance. In the case of flash floods, a few hours lead time can afford disaster managers the opportunity to take steps which may significantly reduce loss of life and damage to property. The engineering challenges in developing and implementing such a system are numerous. In this thesis, the design and implement at ion of a flash flood forecasting system in South Africa is critically examined. The technical aspect s relating to spatio-temporal rainfall estimation and now casting are a key area in which new contributions are made. In particular, field and optical flow advection algorithms are adapted and refined to help predict future path s of storms; fast and pragmatic algorithms for combining rain gauge and remote sensing (rada r and satellite) estimates are re fined and validated; a two-dimensional adaptation of Empirical Mode Decomposition is devised to extract the temporally persistent structure embedded in rainfall fields. A second area of significant contribution relates to real-time fore cast updates, made in response to the most recent observed information. A number of techniques embedded in the rich Kalm an and adaptive filtering literature are adopted for this purpose. The work captures the current "state of play" in the South African context and hopes to provide a blueprint for future development of an essential tool for disaster management. There are a number of natural spin-offs from this work for related field s in water resources management

    Multi-scale approaches for the statistical analysis of microarray data (with an application to 3D vesicle tracking)

    Get PDF
    The recent developments in experimental methods for gene data analysis, called microarrays, provide the possibility of interrogating changes in the expression of a vast number of genes in cell or tissue cultures and thus in depth exploration of disease conditions. As part of an ongoing program of research in Guy A. Rutter (G.A.R.) laboratory, Department of Biochemistry, University of Bristol, UK, with support from the Welcome Trust, we study the impact of established and of potentially new methods to the statistical analysis of gene expression data.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Cone Penetration Testing 2022

    Get PDF
    This volume contains the proceedings of the 5th International Symposium on Cone Penetration Testing (CPT’22), held in Bologna, Italy, 8-10 June 2022. More than 500 authors - academics, researchers, practitioners and manufacturers – contributed to the peer-reviewed papers included in this book, which includes three keynote lectures, four invited lectures and 169 technical papers. The contributions provide a full picture of the current knowledge and major trends in CPT research and development, with respect to innovations in instrumentation, latest advances in data interpretation, and emerging fields of CPT application. The paper topics encompass three well-established topic categories typically addressed in CPT events: - Equipment and Procedures - Data Interpretation - Applications. Emphasis is placed on the use of statistical approaches and innovative numerical strategies for CPT data interpretation, liquefaction studies, application of CPT to offshore engineering, comparative studies between CPT and other in-situ tests. Cone Penetration Testing 2022 contains a wealth of information that could be useful for researchers, practitioners and all those working in the broad and dynamic field of cone penetration testing
    corecore