42 research outputs found

    Change‐Point Detection on Solar Panel Performance Using Thresholded LASSO

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/135028/1/qre2077.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/135028/2/qre2077_am.pd

    Estimation and Inference for High-Dimensional Gaussian Graphical Models with Structural Constraints.

    Full text link
    This work discusses several aspects of estimation and inference for high-dimensional Gaussian graphical models and consists of two main parts. The first part considers network-based pathway enrichment analysis based on incomplete network information. Pathway enrichment analysis has become a key tool for biomedical researchers to gain insight into the underlying biology of differentially expressed genes, proteins and metabolites. We propose a constrained network estimation framework that combines network estimation based on cell- and condition-specific high-dimensional Omics data with interaction information from existing data bases. The resulting pathway topology information is subsequently used to provide a framework for simultaneous testing of differences in expression levels of pathway members, as well as their interactions. We study the asymptotic properties of the proposed network estimator and the test for pathway enrichment, and investigate its small sample performance in simulated experiments and illustrate it on two cancer data sets. The second part of the thesis is devoted to reconstructing multiple graphical models simultaneously from high-dimensional data. We develop methodology that jointly estimates multiple Gaussian graphical models, assuming that there exists prior information on how they are structurally related. The proposed method consists of two steps: in the first one, we employ neighborhood selection to obtain estimated edge sets of the graphs using a group lasso penalty. In the second step, we estimate the nonzero entries in the inverse covariance matrices by maximizing the corresponding Gaussian likelihood. We establish the consistency of the proposed method for sparse high-dimensional Gaussian graphical models and illustrate its performance using simulation experiments. An application to a climate data set is also discussed.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113495/1/mjing_1.pd

    Soybean Response to Water: Trait Identification and Prediction

    Get PDF
    The rising demand for soybean [Glycine Max (L.) Merrill] taken in consideration with current climatic trends accentuates the importance of improving soybean seed yield response per unit water (WP). To further our understanding of the quantitative WP trait, a multi-omic approach was implemented for improved trait identification and predictive modeling opportunities. Through the evaluation of two recombinant inbred line populations jointly totaling 439 lines subjected to contrasting irrigation treatments, informative agronomic, phenomic, and genomic associations were identified. Across both populations, relationships were identified between lodging at maturity (r = -0.58, H = 0.86), canopy to air temperature differential at the V5 growth stage (r = -0.31, H = 0.39), the SR680 spectral index collected at the R5 growth stage, (r = 0.62, H = 0.39), and a quantitative trait loci at approximately 30 centimorgans on chromosome 19 (r = 0.27) to WP. Through the integration of significant agronomic, phenomic, and genomic traits, predictive models of WP were developed across environments on an entry mean basis (r = 0.72, RMSE = 0.67 kg ha-1 mm-1) and on a per plot basis (r = 0.95, RMSE = 0.39 kg ha-1 mm-1) using machine learning algorithms. Our results highlight the value of integrating multiple dataset types to study and model quantitative traits. Through the application of our findings, soybean breeders can potentially deploy multi-omic selection models in early generation screening stages to increase the rate of genetic gain in relation to soybean WP. Advisor: George L. Grae

    Long-term neural and physiological phenotyping of a single human

    Get PDF
    Psychiatric disorders are characterized by major fluctuations in psychological function over the course of weeks and months, but the dynamic characteristics of brain function over this timescale in healthy individuals are unknown. Here, as a proof of concept to address this question, we present the MyConnectome project. An intensive phenome-wide assessment of a single human was performed over a period of 18 months, including functional and structural brain connectivity using magnetic resonance imaging, psychological function and physical health, gene expression and metabolomics. A reproducible analysis workflow is provided, along with open access to the data and an online browser for results. We demonstrate dynamic changes in brain connectivity over the timescales of days to months, and relations between brain connectivity, gene expression and metabolites. This resource can serve as a testbed to study the joint dynamics of human brain and metabolic function over time, an approach that is critical for the development of precision medicine strategies for brain disorders

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    Statistical downscaling of air quality models using Principal Fitted Components

    Get PDF
    Statistical downscaling is a technique that is used to extract high-resolution information from regional scale variables produced by Chemical Transport Models (CTMs). The aim of this thesis is to shade light on the advantages of statistical downscaling in improving the forecasting ability of air quality models. Many statistical downscaling methods in geophysics often rely on dimension reduction techniques to reduce the spatial dimension of gridded model outputs without loss of essential spatial information. In this thesis we developed a new downscaling methodology that relies on using Principal Fitted Components (PFCs) to downscale an air quality model. The main advantage of employing PFCs in downscaling relies in the fact that PFCs represent space-time variations associated with a particular location through the use of inverse regression. This means that PFCs will emphasize on location related regional information. We illustrate our proposed method by both simulation and application on ground level ozone over southeastern U.S region to downscale the Regional ChEmical TrAnsport Model (REAM). Both simulation and applications results indicate that PFC downscaling appears to yield more accurate forecasts. Moreover, we accommodate the fact that covariance matrices that are used to compute PFCs might be unstable due to the fact that they have a relatively large dimension. This issue has motivated us to regularize the covariance matrices by thresholding prior to computing the PFCs and then proceed with the downscaling using thresholded PFCs. We illustrate the modified downscaling approach by simulation and application to ground level ozone. Simulation results suggest that employing thresholded PFCs in downscaling have improved the downscaling results, however, the application results do not agree with the simulation results. Finally, we extend our PFC downscaling method to downscale an ensemble of air quality models. We propose a new two-stage dimension reduction approach to reduce the dimension of an ensemble. The proposed methodology reduces the spatial dimension in each ensemble member, and then the reduced variables are reduced further across the ensemble models. We illustrate our proposed methodology by simulation and application to downscale ground level ozone ensemble outputs in France. Both simulation and application results suggest that our proposed technique seem to show an adequate predictive performance

    Advanced Data Analytics for Data-rich Multistage Manufacturing Processes

    Get PDF
    Nowadays, multistage manufacturing processes (MMPs) are usually equipped with complex sensing systems. They generate data with several unique characteristics: the output quality measurements from each stage are of different types, the comprehensive set of inputs (or process variables) have distinct degrees of influence over the process, and the relationship between the inputs and outputs is sometimes ambiguous, and multiple types of faults repetitively occur to the process during its operation. These characteristics of the data lead to new challenges in the data analytics of MMPs. In this thesis, we conduct three studies to tackle those new challenges from MMPs. In the first study, we propose a feature ranking scheme that ranks the process features based on their relationship with the final product quality. Our ranking scheme is called sparse distance correlation (SpaDC), and it satisfies the important diversity criteria from the engineering perspective and encourages the features that uniquely characterize the manufacturing process to be prioritized. The theoretical properties of SpaDC are studied. Simulations, as well as two real-case studies, are conducted to validate the method. In the second study, we propose a holistic modeling approach for the MMPs, aiming at understanding how intermediate quality measurements of mixed profile outputs relate to sparse effective inputs. This model can identify the effective inputs, output variation patterns, and establish connections between them. Specifically, the aforementioned objective is achieved by formulating and solving an optimization problem that involves the effects of process inputs on the outputs across the entire MMP. This ADMM algorithm that solves this problem is highly parallelizable and thus can handle a large amount of data of mixed types obtained from MMPs. In the third study, a retrospective analysis method is proposed for multiple functional signals. This method simultaneously identifies when multiple events occur to the system and characterizes how they affect the multiple sensing signals. A problem is formulated using the dictionary learning method, and the solution is obtained by iteratively updating the event signatures and sequences using ADMM algorithms. In the end, the potential extensions to the general interconnect systems are discussed.Ph.D
    corecore