497,908 research outputs found

    Bovine oocytes in secondary follicles grow and acquire meiotic competence in severe combined immunodeficient mice

    Get PDF
    A rigorous methodology is developed that addresses numerical and statistical issues when developing group contribution (GC) based property models such as regression methods, optimization algorithms, performance statistics, outlier treatment, parameter identifiability, and uncertainty of the prediction. The methodology is evaluated through development of a GC method for the prediction of the heat of combustion (Δ<i>H</i><sub>c</sub><sup>o</sup>) for pure components. The results showed that robust regression lead to best performance statistics for parameter estimation. The bootstrap method is found to be a valid alternative to calculate parameter estimation errors when underlying distribution of residuals is unknown. Many parameters (first, second, third order group contributions) are found unidentifiable from the typically available data, with large estimation error bounds and significant correlation. Due to this poor parameter identifiability issues, reporting of the 95% confidence intervals of the predicted property values should be mandatory as opposed to reporting only single value prediction, currently the norm in literature. Moreover, inclusion of higher order groups (additional parameters) does not always lead to improved prediction accuracy for the GC-models; in some cases, it may even increase the prediction error (hence worse prediction accuracy). However, additional parameters do not affect calculated 95% confidence interval. Last but not least, the newly developed GC model of the heat of combustion (Δ<i>H</i><sub>c</sub><sup>o</sup>) shows predictions of great accuracy and quality (the most data falling within the 95% confidence intervals) and provides additional information on the uncertainty of each prediction compared to other Δ<i>H</i><sub>c</sub><sup>o</sup> models reported in literature

    A Latent Factor Approach for Social Network Analysis

    Get PDF
    Social network data consist of entities and the relation of information between pairs of entities. Observations in a social network are dyadic and interdependent. Therefore, making appropriate statistical inferences from a network requires specifications of dependencies in a model. Previous studies suggested that latent factor models (LFMs) for social network data can account for stochastic equivalence and transitivity simultaneously, which are the two primary dependency patterns that are observed social network data in real-world social networks. One particular LFM, the additive and multiplicative effects network model (AME) accounts for the heterogeneity of second-order dependencies at the actor level. However, all current latent variable models have not considered the heterogeneity of third-order dependencies, actor-level transitivity for example. Failure to model third-order dependency heterogeneity may result in worse fits to local network structures, which in turn may result in biased parameter inferences and may negatively influence the goodness-of-fit and prediction performance of a model. Motivated by such a gap in the literature, this dissertation proposes to incorporate a correlation structure between the sender and receiver latent factors in the AME to account for the distribution of actor-level transitivity. The proposed model is compared with the existing AME in both simulation studies real-world data. Models are evaluated via multiple goodness-of-fit techniques, including mean squared error, parameter coverage rate, information criteria, receiver-operation curve (ROC) based on K-fold cross-validation or full data, and posterior predictive checking. This work may also contribute to the literature of goodness-of-fit methods to network models, which is an area that has not been unified. Both the simulation studies and real-world data analyses showed that adding the correlation structure provides a better fit as well as higher prediction accuracy to network data. The proposed method has equal or similar performance to the AME when the underlying correlation is zero, with regard to mean-squared error of probability of ties and widely applicable information criteria. The present study did not find any significant impact of the correlation term on the node-level covariate’s coefficient estimation. Future studies include investigating more types of covariates, subgroup related covariate effects is an example

    Assessing and accounting for correlation in RNA-seq data analysis

    Get PDF
    RNA-sequencing (RNA-seq) technology is a high-throughput next-generation sequencing procedure. It allows researchers to measure gene transcript abundance at a lower cost and with a higher resolution. Advances in RNA-seq technology promoted new methodological development in several branches of quantitative analysis for RNA-seq data. In this dissertation, we focus on several topics related to RNA-seq data analysis. This dissertation is comprised of three papers on the analysis of RNA-seq data. We first introduce a method for detecting differentially expressed genes across different experimental conditions with correlated RNA-seq data. We fit a general linear model to the transformed read counts of each gene and assume the error vector has a block-diagonal correlation matrix with unstructured blocks that account for within-gene correlations. In order to stabilize parameter estimation with limited replicates, we shrink the residual maximum likelihood estimator of correlation parameters toward a mean-correlation locally-weighted scatterplot smoothing curve. The shrinkage weights are determined by using a hierarchical model and then estimated via parametric bootstrap. Due to the information sharing across genes in parameter estimation, the null distribution of test statistic is unknown and mathematically intractable. Thus, we approximate the null test distribution through a parametric bootstrap strategy. Next, we focus on correlation estimation between genes. Gene co-expression correlation estimation is a fundamental step in gene co-expression network construction. The correlation estimates could also be used as inputs of topological statistics which help analyze gene functions. We propose a new strategy for co-expression correlation definition and estimation. We introduce a motivating dataset with two factors and a split-plot experimental design. We define two types of co-expression correlations that originate from two different sources. We apply a linear mixed model to each gene pair. The correlations within random effects and random errors are used to represent the two types of correlations. Finally, we consider a basic topic in quantitative RNA-seq analysis, gene filtering. It is essential to remove genes with extremely low read counts before further analysis to avoid numerical problems and to get a more stable estimates. For most differential expression and gene network analyses tools, there are embedded gene filtering functions. In general, these functions rely on a user-defined hard threshold for gene selection and fail to make full use of gene features, such as gene length and GC content level. Several studies have shown that gene features have a significant impact on RNA-sequencing efficiency and thus should be considered in subsequent analysis. We propose to fit a model involving a two-component mixture of Gaussian distribution to the transformed read counts for each sample and assume all parameters are functions of GC content. We adopt a modified semiparametric expectation-maximization algorithm for parameter estimation. We perform a series of simulation studies and show, that in many cases, the proposed methods improve upon existing methods and are more robust

    Serial correlation in dynamic panel data models with weakly exogenous regressor and fixed effects

    Get PDF
    Our paper wants to present and compare two estimation methodologies for dynamic panel data models in the presence of serially correlated errors and weakly exogenous regressors. The ¯rst is the ¯rst di®erence GMM estimator as proposed by Arellano and Bond (1991) and the second is the transformed Maximum Likelihood Estimator as proposed by Hsiao, Pesaran, and Tahmiscioglu (2002). Thereby, we consider the ¯xed e®ects case and weakly exogenous regressors. The ¯nite sample properties of both estimation methodologies are analysed within a simulation experiment. Furthermore, we will present an empirical example to consider the performance of both estimators with real data. JEL Classification: C23, J6

    Estimation of Dynamic Mixed Double Factors Model in High Dimensional Panel Data

    Full text link
    The purpose of this article is to develop the dimension reduction techniques in panel data analysis when the number of individuals and indicators is large. We use Principal Component Analysis (PCA) method to represent large number of indicators by minority common factors in the factor models. We propose the Dynamic Mixed Double Factor Model (DMDFM for short) to re ect cross section and time series correlation with interactive factor structure. DMDFM not only reduce the dimension of indicators but also consider the time series and cross section mixed effect. Different from other models, mixed factor model have two styles of common factors. The regressors factors re flect common trend and reduce the dimension, error components factors re ect difference and weak correlation of individuals. The results of Monte Carlo simulation show that Generalized Method of Moments (GMM) estimators have good unbiasedness and consistency. Simulation also shows that the DMDFM can improve prediction power of the models effectively.Comment: 38 pages, 2 figure

    Degeneracy of gravitational waveforms in the context of GW150914

    Full text link
    We study the degeneracy of theoretical gravitational waveforms for binary black hole mergers using an aligned-spin effective-one-body model. After appropriate truncation, bandpassing, and matching, we identify regions in the mass--spin parameter space containing waveforms similar to the template proposed for GW150914, with masses m1=36−4+5M⊙m_1 = 36^{+5}_{-4} M_\odot and m2=29−4+4M⊙m_2 = 29^{+4}_{-4} M_\odot, using the cross-correlation coefficient as a measure of the similarity between waveforms. Remarkably high cross-correlations are found across broad regions of parameter space. The associated uncertanties exceed these from LIGO's Bayesian analysis considerably. We have shown that waveforms with greatly increased masses, such as m1=70M⊙m_1 = 70 M_\odot and m2=35M⊙m_2 = 35 M_\odot, and strong anti-aligned spins (χ1=0.95\chi_1=0.95 and χ2=−0.95\chi_2=-0.95) yield almost the same signal-to-noise ratio in the strain data for GW150914.Comment: Accepted for publication in JCA

    Quantum field tomography

    Get PDF
    We introduce the concept of quantum field tomography, the efficient and reliable reconstruction of unknown quantum fields based on data of correlation functions. At the basis of the analysis is the concept of continuous matrix product states, a complete set of variational states grasping states in quantum field theory. We innovate a practical method, making use of and developing tools in estimation theory used in the context of compressed sensing such as Prony methods and matrix pencils, allowing us to faithfully reconstruct quantum field states based on low-order correlation functions. In the absence of a phase reference, we highlight how specific higher order correlation functions can still be predicted. We exemplify the functioning of the approach by reconstructing randomised continuous matrix product states from their correlation data and study the robustness of the reconstruction for different noise models. We also apply the method to data generated by simulations based on continuous matrix product states and using the time-dependent variational principle. The presented approach is expected to open up a new window into experimentally studying continuous quantum systems, such as encountered in experiments with ultra-cold atoms on top of atom chips. By virtue of the analogy with the input-output formalism in quantum optics, it also allows for studying open quantum systems.Comment: 31 pages, 5 figures, minor change
    • …
    corecore