66 research outputs found

    Artificial Mixture Methods for Correlated Nominal Responses and Discrete Failure Time.

    Full text link
    Multinomial logit model with random effects is a common choice for modeling correlated nominal responses. But due to the presence of random effects and the complex form of the multinomial probabilities, the computation is often costly. We generalize the artificial mixture method for independent nominal response to correlated nominal responses. Our method transforms the complex multinomial likelihood to Poisson-type likelihoods and hence allows for the estimates to be obtained iteratively solving a set of independent low-dimensional problems. The methodology is applied to real data and studied by simulations. For discrete failure time data in large data sets, there are often many ties and a large number of distinct event time points. This poses a challenge of a high-dimensional model. We explore two ideas with the discrete proportional odds (PO) model due to its methodological and computational convenience. The log-likelihood function of discrete PO model is the difference of two convex functions; hence difference convex algorithm (DCA) carries over and brings computational efficiency. An alternative method proposed is a recursive procedure. As a result of simulation studies, these two methods work better than Quasi-Newton method in terms of both accuracy and computational time. The results from the research on the discrete PO model motivate us to develop artificial mixture methods to discrete failure time data. We consider a general discrete transformation model and mediate the high-dimensional optimization problem by changing the model form at the “complete-data” level (conditional on artificial variables). Two complete data representations are studied: proportional hazards (PH) and PO mixture frameworks. In the PH mixture framework, we reduce the high-dimensional optimization problem to many one-dimensional problems. In the PO mixture framework, both recursive solution and DCA can be synthesized into the M-step of EM algorithm leading to simplification in the optimization. PO mixture method is recommended due to its simplicity. It is applied to real data sets to fit a discrete PH and PHPH models. Simulation study fitting discrete PH model shows that the advocated PO mixture method outperforms Quasi-Newton method in terms of both accuracy and speed.Ph.D.BiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91531/1/sfwang_1.pd

    An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach

    Get PDF
    Continuously indexed Gaussian fields (GFs) are the most important ingredient in spatial statistical modelling and geostatistics. The specification through the covariance function gives an intuitive interpretation of the field properties. On the computational side, GFs are hampered with the big n problem, since the cost of factorizing dense matrices is cubic in the dimension. Although computational power today is at an all time high, this fact seems still to be a computational bottleneck in many applications. Along with GFs, there is the class of Gaussian Markov random fields (GMRFs) which are discretely indexed. The Markov property makes the precision matrix involved sparse, which enables the use of numerical algorithms for sparse matrices, that for fields in R-2 only use the square root of the time required by general algorithms. The specification of a GMRF is through its full conditional distributions but its marginal properties are not transparent in such a parameterization. We show that, using an approximate stochastic weak solution to (linear) stochastic partial differential equations, we can, for some GFs in the Matern class, provide an explicit link, for any triangulation of R-d, between GFs and GMRFs, formulated as a basis function representation. The consequence is that we can take the best from the two worlds and do the modelling by using GFs but do the computations by using GMRFs. Perhaps more importantly, our approach generalizes to other covariance functions generated by SPDEs, including oscillating and non-stationary GFs, as well as GFs on manifolds. We illustrate our approach by analysing global temperature data with a non-stationary model defined on a sphere

    Model Specification and Prediction in Joint Modelling

    Get PDF
    This thesis explores several methodological aspects of joint modelling of longitudinal outcomes and recurrent and terminal events, including variable selection, description, prediction, causal inference and model specification. The methods we discuss were motivated by the Community Ageing Research study (CARE75+) to investigate the relationships between frailty, falls and mortality. These outcomes have previously been analyzed with marginal models, but not as joint outcomes. We propose a variable selection strategy to optimize prediction of joint models for longitudinal and time-to-event outcomes. This strategy combines penalized likelihood with the LASSO penalty and cross-validation methods to select the fixed effects that optimize simultaneously the mean-squared error (MSE) and the Integrated Brier Score (IBS). Our simulation studies suggest that it is not always possible to optimize simultaneously MSE and IBS, but there seems to be a region defined by the constraints close to an optimal solution. In such a case a small compromise between MSE and IBS is required, depending on which outcome is the priority. Joint modelling has been an area of active research for description and prediction, but causal inference has received less attention. Using Direct Acyclic Graphs, we state our hypotheses about the paths between frailty, falls and mortality and confounders to formulate joint models adjusting for confounders. Via simulation studies we assessed the consequences of model misspecification, finding that even when link of the joint model and some features of the mean structure are not the correct ones, the fixed effects can still be correctly estimated

    Autoencoder-based techniques for improved classification in settings with high dimensional and small sized data

    Get PDF
    Neural network models have been widely tested and analysed usinglarge sized high dimensional datasets. In real world application prob-lems, the available datasets are often limited in size due to reasonsrelated to the cost or difficulties encountered while collecting the data.This limitation in the number of examples may challenge the clas-sification algorithms and degrade their performance. A motivatingexample for this kind of problem is predicting the health status of atissue given its gene expression, when the number of samples availableto learn from is very small.Gene expression data has distinguishing characteristics attracting themachine learning research community. The high dimensionality ofthe data is one of the integral features that has to be considered whenbuilding predicting models. A single sample of the data is expressedby thousands of gene expressions compared to the benchmark imagesand texts that only have a few hundreds of features and commonlyused for analysing the existing models. Gene expression data samplesare also distributed unequally among the classes; in addition, theyinclude noisy features which degrade the prediction accuracy of themodels. These characteristics give rise to the need for using effec-tive dimensionality reduction methods that are able to discover thecomplex relationships between the features such as the autoencoders. This thesis investigates the problem of predicting from small sizedhigh dimensional datasets by introducing novel autoencoder-basedtechniques to increase the classification accuracy of the data. Twoautoencoder-based methods for generating synthetic data examplesand synthetic representations of the data were respectively introducedin the first stage of the study. Both of these methods are applicableto the testing phase of the autoencoder and showed successful in in-creasing the predictability of the data.Enhancing the autoencoder’s ability in learning from small sized im-balanced data was investigated in the second stage of the projectto come up with techniques that improved the autoencoder’s gener-ated representations. Employing the radial basis activation mecha-nism used in radial-basis function networks, which learn in a super-vised manner, was a solution provided by this thesis to enhance therepresentations learned by unsupervised algorithms. This techniquewas later applied to stochastic variational autoencoders and showedpromising results in learning discriminating representations from thegene expression data.The contributions of this thesis can be described by a number of differ-ent methods applicable to different stages (training and testing) anddifferent autoencoder models (deterministic and stochastic) which, in-dividually, allow for enhancing the predictability of small sized highdimensional datasets compared to well known baseline methods

    Assessing causality in financial time series

    Get PDF
    We develop new classes of semiparametric multivariate time series models based on Multi-Output Gaussian Processes and warped Multi-Output Gaussian Processes. These describe relationships between a current vector of observations and the lagged history of each marginal time series. We encode a serial dependence structure through mean and covariance functions and introduce a more complex dependence structure using copulas to couple each warped marginal Gaussian process. Within this class of models our primary goal is to detect causality and to study the interplay between the causal structure and the dependence structure. We do not, however, require true representation of the data generating process, but we model structural hypotheses regarding how causality may have manifested in the observed vector valued processes. With our framework we test the dependence with regards to the structures that are specified, and can use testing for causality under different model assumptions as a way to explore the data and the potentially complex dependence relationships. To perform the testing we consider several families of causality testing and develop compound tests which first require estimation/calibration of the mean and covariance functions parametrising the nonparametric vector valued time series. Our approach allows very general nonlinear dependence and causal relationships which are not often considered in classical parametric time series models, including causality in higher order information and joint extreme dependence features. We provide a generic framework which can be applied to a variety of different problem classes and discuss a number of examples to illustrate the ideas developed. Throughout, we will consider, without loss of generality, two multivariate time series denoted by X_t in R^d, Y_t in R^d' where one may assume, for instance, that these have been generated by observing partial realisations of a generalised diffusion processes: dX_t = mu_X(t, X_t^{-k}, Y_t^{-l}, Z_t^{-m}) dt + Sigma_X(t, X_t^{-k}, Y_t^{-l}, Z_t^{-m}) dW_t dY_t = mu_Y(t, X_t^{-k}, Y_t^{-l}, Z_t^{-m}) dt + Sigma_Y(t, X_t^{-k}, Y_t^{-l}, Z_t^{-m}) dW'_t, where Z_t, which may or may not be included, is some real process that we will call side information, dW_t, dW'_t are two different Brownian motions, possibly with marginal serial correlation and/or instantaneous cross-correlation. All of those processes are only partially observed, and may be sampled at irregular intervals. The form of drift and volatility described by the diffusion equation means that the processes X_t and Y_t can be conditionally dependent on each other, and this dependence can be introduced through both the drift and the volatility. Such generalised diffusion models can induce in the marginal process between X_t and Y_t different types of extremal dependence, depending on the forms of the drift and volatility functions. We propose a smooth stochastic process statistical model to capture the smooth variation of the partially observed time series represented by data X_t, Y_t, Z_t using multiple output Warped Gaussian Process models. In this work we are interested in partial observations of these processes, for which the partially observed time series of X_t and Y_t will have different types of extremal dependence characteristics. We wish to detect the presence or absence of statistical causality where such extremal dependence features may or may not obfuscate the ability to detect causality in nonlinear partially observed time series models. The rationale for developing a semiparametric solution for modelling the partially observed time series is that we may accommodate, through the use of Gaussian Process models, a wide variety of features for the hypotheses about the trends and volatility and importantly their possible causal structures, which can be formally tested in our framework. Furthermore the use of Warped Gaussian Process models allows to incorporate higher order dependence such as extremal tail dependence features. Statistical Causality. The notion of causality that lies at the centre of our research is the concept of statistical causality, based on comparing two predictive models. Quoting Wiener [1956]: "For two simultaneously measured signals, if we can predict the first signal better by using the past information from the second one than by using the information without it, then we call the second signal causal to the first one". The null hypothesis of no causal relationship from time series X_t to Y_t means that including the past of X_t does not improve the prediction of future of Y_t. In a most general form this can be written as equality of conditional distribution of Y, conditioning on either set of explanatory variables (X_t^{-k}, Y_t^{-l}, Z_t^{-m}) denote past of the X_t, Y_t, Z_t time series up to lags k,l,m respectively): H_0: p(Y_t | X_{t-1}^{-k}, Y_{t-1}^{-l}, Z_{t-1}^{-m}) = p(Y_t \mid Y_{t-1}^{-l}, \bZ_{t-1}^{-m}) H_1: p(Y_t | X_{t-1}^{-k}, Y_{t-1}^{-l}, Z_{t-1}^{-m}) p(Y_t \mid Y_{t-1}^{-l}, \bZ_{t-1}^{-m}). The type of casual dependence that is described by statistical causality is a mechanism that occurs at multiple lags over time - which could have been triggered by a sequence of processes, not an individual one. It can help to gain an insight into both cross-sectional and temporal dynamics of the data analysed. Warped Multi-Output Gaussian Processes. A Gaussian process is a Markov process, such that all finite dimensional distributions are Gaussian. While Gaussian processes models can accommodate wide range of properties and are very attractive for their easy implementation and optimisation, but they do not allow higher order dependence such as extremal tail dependence features. One way to generalise Gaussian process models so that higher order dependence can be handled, is to apply a transformation to the joint collection of Gaussian processes for each marginal time series model. We apply mean-variance transformation that results in the transformed variables having multivariate skew-t distributions and being finite dimensional realisations of a general multivariate skew-t process. Motivation for the Model Choice. There are numerous advantages of using Gaussian Processes, beginning with: ease of optimisation and interpretability of hyperparameters, flexibility, richness of covariance functions, allowing for various model structures. Using a likelihood ratio type test with a GP is a very natural choice, as estimating GP model parameters is often done on the basis of maximising likelihood, and therefore this estimation can be incorporated into the compound version of the likelihood ratio test (Generalised Likelihood Ratio Test, GLRT). From Gaussian variables, GPs inherited the property of being fully specified by the mean and the covariance, and so testing for model equivalence inherently means testing for equivalence of the mean and covariance functions. But many popular kernels do not have the ARD property, and using them for a likelihood ratio test settings gives no easy way to account for causal structures in covariance. Consequently, it is using GLRT with an ARD-GP that gives a uniformly most powerful test with an unparalleled flexibility: known asymptotic distribution under the null, explicit evaluation and in a closed form, and usefulness also for misspecified models. The proposed use of copula warping allows introduction of additional dependence, in particular tail dependence, while keeping the likelihood in closed form. Application. We provide a generic framework which can be applied to a wide range of problems, and which can be readily tailored or further extended. The illustrative examples included demonstrate how a range of data properties can be encoded in the model, and how they might affect the detection of causality. We present two real data application: to commodity futures data and inflation and interest rates. We show how the framework can be used in practice, and how it can be combined with, or enhance, more common approaches to analysing financial time series. Our observations are in line with financial interpretations, but they also offer additional insight and pose thought-provoking questions. Structure of the thesis. This thesis presents the research as it evolved: starting from an overview of a range of the causality methods already known, and demonstrating out why they are unsatisfactory. Subsequently, a new approach is presented -- a method based on Gaussian processes, that was developed to solve the drawbacks of the methods presented in the first part. Afterwards, an extension is proposed to widen the range of dependence structures, as well as marginal properties of the data that can be incorporated. Chapter 1 introduces the topic of the thesis, and reviews relevant literature. Chapter 2 discusses philosophical roots of the concept of statistical causality, as well as alternative notions of causality. After illustrating some of the varied ways of conceptual representation of causality, we present four distinct ways of modelling statistical causality. Chapter 3 contains background on the models considered: Gaussian processes, copulas and selected distributions. Chapter 4 describes inference procedures used: assessing hypothesis tests, generalised likelihood ratio test, permutation tests, and likelihood ratio test. The second part, New Perspectives on Causality Representation and Inference, presents the main contribution of our work. It starts with Chapter 5 containing the theoretical background for describing and testing causality with GP models. Chapter 6 extends the model from the previous chapter by introducing mean-variance transformation that results in a warped GP model, which can describe causality in the presence of skewness and tail dependence. Chapter 7 describes how synthetic data has been simulated, details the algorithm for approximating likelihood in the warped GP, and provides information on other relevant algorithms and the software used to implement our method. Chapter 8 presents an extensive experiment section, which aim to show, firstly, the good behaviour of the proposed procedures (model sensitivity and misspecification analysis), secondly, good power of the test for a range of structures, and, thirdly, the interaction of causality and tail dependence. Applications to real-world data are described in Chapter 9, where time series for commodities and currency markets are analysed. Finally, Chapter 10 presents the conclusions and directions for further development, and Appendices provide supplementary material
    corecore