1,537 research outputs found

    On Variable Selections in High-dimensional Incomplete Data

    Get PDF
    Modern Statistics has entered the era of Big Data, wherein data sets are too large, high-dimensional, incomplete and complex for most classical statistical methods. This analysis of Big data firstly focuses on missing data. We compare different multiple imputation methods. Combining the characteristics of medical high-throughput experiments, we compared multivariate imputation by chained equations (MICE), missing forest (missForest), as well as self-training selection (STS) methods. A phenotypic data set of common lung disease was assessed. Moreover, in terms of improving the interpretability and predictability of the model, variable selection plays a pivotal role in the following analysis. Taking the Lasso-Poisson model as an example, we illustrate the robust random Lasso method in the Meta-analysis of multiple datasets for variable selection. Thus, the real data analysis clarifies that missForest and STS outperform MICE. Moreover, the simulation results show that although this method is as effective in selecting important variables as using the random Lasso method, meta-analysis based on the random Lasso is better in terms of coefficient estimation and elimination of unimportant variables. In conclusion, We firstly propose a missForest random lasso (MFRL) method to complete the multiple imputation of the high-dimensional data and robustly select important variables

    Conditional Markov chain and its application in economic time series analysis

    Get PDF
    Motivated by the great moderation in major U.S. macroeconomic time series, we formulate the regime switching problem through a conditional Markov chain. We model the long-run volatility change as a recurrent structure change, while short-run changes in the mean growth rate as regime switches. Both structure and regime are unobserved. The structure is assumed to be Markovian. Conditioning on the structure, the regime is also Markovian, whose transition matrix is structure-dependent. This formulation imposes interpretable restrictions on the Hamilton Markov switching model. Empirical studies show that this restricted model well identifies both short-run regime switches and long-run structure changes in the U.S. macroeconomic data.Markov regime switching; Conditional Markov chain

    Automated Morphology Analysis of Nanoparticles

    Get PDF
    The functional properties of nanoparticles highly depend on the surface morphology of the particles, so precise measurements of a particle's morphology enable reliable characterizing of the nanoparticle's properties. Obtaining the measurements requires image analysis of electron microscopic pictures of nanoparticles. Today's labor-intensive image analysis of electron micrographs of nanoparticles is a significant bottleneck for efficient material characterization. The objective of this dissertation is to develop automated morphology analysis methods. Morphology analysis is comprised of three tasks: separate individual particles from an agglomerate of overlapping nano-objects (image segmentation); infer the particle's missing contours (shape inference); and ultimately, classify the particles by shape based on their complete contours (shape classification). Two approaches are proposed in this dissertation: the divide-and-conquer approach and the convex shape analysis approach. The divide-and-conquer approach solves each task separately, taking less than one minute to complete the required analysis, even for the largest-sized micrograph. However, its separating capability of particle overlaps is limited, meaning that it is able to split only touching particles. The convex shape analysis approach solves shape inference and classification simultaneously for better accuracy, but it requires more computation time, ten minutes for the biggest-sized electron micrograph. However, with a little sacrifice of time efficiency, the second approach achieves far superior separation than the divide-and-conquer approach, and it handles the chain-linked structure of particle overlaps well. The capabilities of the two proposed methods cannot be substituted by generic image processing and bio-imaging methods. This is due to the unique features that the electron microscopic pictures of nanoparticles have, including special particle overlap structures, and large number of particles to be processed. The application of the proposed methods to real electron microscopic pictures showed that the two proposed methods were more capable of extracting the morphology information than the state-of-the-art methods. When nanoparticles do not have many overlaps, the divide-and-conquer approach performed adequately. When nanoparticles have many overlaps, forming chain-linked clusters, the convex shape analysis approach performed much better than the state-of-the-art alternatives in bio-imaging. The author believes that the capabilities of the proposed methods expedite the morphology characterization process of nanoparticles. The author further conjectures that the technical generality of the proposed methods could even be a competent alternative to the current methods analyzing general overlapping convex-shaped objects other than nanoparticles

    An EM-based identification algorithm for a class of hybrid systems with application to power electronics

    Full text link
    In this paper we present an identification algorithm for a class of continuous-time hybrid systems. In such systems, both continuous-time and discrete-time dynamics are involved. We apply the expectation-maximisation algorithm to obtain the maximum likelihood estimate of the parameters of a discrete-time model expressed in incremental form. The main advantage of this approach is that the continuous-time parameters can directly be recovered. The technique is particularly well suited to fast-sampling rates. As an application, we focus on a standard identification problem in power electronics. In this field, our proposed algorithm is of importance since accurate modelling of power converters is required in high- performance applications and for fault diagnosis. As an illustrative example, and to verify the performance of our proposed algorithm, we apply our results to a flying capacitor multicell converter. © 2014 © 2014 Taylor & Francis

    Essays on asynchronous time series and related multidimensional data

    Get PDF
    This thesis focusses on asynchronous time series and related multidimensional data: timedependent measurements with varying publication delays. This class of data exists in a broad range of fields. In social sciences, most official time series and repeated surveys are indeed asynchronous in nature since statistical offices need time to collect and aggregate raw data. In STEM, statistical offices are generally less relevant and most publication delays are caused by more exotic factors. For instance, with series derived from technological networks, they are usually generated by a direct reference (digital or textual) of the past (e.g., publishing pictures of a trip done a week ago that was also photographed and posted in real time by a friend). As a result, the study of data releases is key for developing accurate real-time models and finds applications in forecasting, policy and risk management
    corecore