497,908 research outputs found
Bovine oocytes in secondary follicles grow and acquire meiotic competence in severe combined immunodeficient mice
A rigorous methodology is developed
that addresses numerical and
statistical issues when developing group contribution (GC) based property
models such as regression methods, optimization algorithms, performance
statistics, outlier treatment, parameter identifiability, and uncertainty
of the prediction. The methodology is evaluated through development
of a GC method for the prediction of the heat of combustion (Δ<i>H</i><sub>c</sub><sup>o</sup>) for pure components. The results showed that robust regression
lead to best performance statistics for parameter estimation. The
bootstrap method is found to be a valid alternative to calculate parameter
estimation errors when underlying distribution of residuals is unknown.
Many parameters (first, second, third order group contributions) are
found unidentifiable from the typically available data, with large
estimation error bounds and significant correlation. Due to this poor
parameter identifiability issues, reporting of the 95% confidence
intervals of the predicted property values should be mandatory as
opposed to reporting only single value prediction, currently the norm
in literature. Moreover, inclusion of higher order groups (additional
parameters) does not always lead to improved prediction accuracy for
the GC-models; in some cases, it may even increase the prediction
error (hence worse prediction accuracy). However, additional parameters
do not affect calculated 95% confidence interval. Last but not least,
the newly developed GC model of the heat of combustion (Δ<i>H</i><sub>c</sub><sup>o</sup>) shows predictions of great accuracy and quality (the most data
falling within the 95% confidence intervals) and provides additional
information on the uncertainty of each prediction compared to other
Δ<i>H</i><sub>c</sub><sup>o</sup> models reported in literature
Recommended from our members
Parameter and volterra-kernel estimation of bilinear systems
It has been established that bilinear models
occur frequently in nature and offer some important
advantages from the standpoint. of controllability,
optimization and modeling. The estimation of
bilinear systeim models from the Measurements
of input-output data are discussed. In the
first approach a parametric model of a discrete-time
linear system is obtained by correlation analysis. The
method is extended to bilinear systems using higher-
order correlations, It is shown that for a pseudorandom
binary input signal the computations in the estimation
algorithm can be simplified. The estimates are
asymptotically normal unbiased and consistent. The
efficiency of the estimates is improved by least-squares fit on a parametric model involving correlation functions.
A recursive formulation is given which makes the algorithm
attractive for on-line implementation. These methods
are compared with maximum-likelihood and least-squares
parameter estimation for a model of a nuclear fission
process.
An experimental furnace to control the temperature
of a sample is modeled. The power applied. to the furnace
and the rate of air flow inside the chamber are the control
variables. Only one input is perturbed at a time
with a pseudorandom binary sequence and the linear and
the bilinear models of the process are obtained from the
input-output measurements. The identification results
are used to design a feedforward-feedback programmable
controller for the system with constant air flow rates.
The second approach is to estimate the first and
the second-order kernels in a Volterra series expansion
of bilinear systems using correlation analysis. The
kernels are estimated for a simulation model of a nuclear
fission process. It is seen that the correlation method
yields good estimates of the first -order kernel under noisy
input -output measurements, However, the second-order
kernel estimates are not satisfactory. A new approach
to the estimation of the second and the higher-order
kernels is then developed. The input-output relation of the bilinear system is represented by an integral-equation.
A Wiener-Hopf type equation is obtained by crosscorrelation
of the input and the output. An algorithm is given to
estimate the unknown parameters in the bilinear operator.
The estimation of the second-order kernel is significantly
improved
A Latent Factor Approach for Social Network Analysis
Social network data consist of entities and the relation of information between
pairs of entities. Observations in a social network are dyadic and interdependent.
Therefore, making appropriate statistical inferences from a network requires specifications
of dependencies in a model. Previous studies suggested that latent factor
models (LFMs) for social network data can account for stochastic equivalence and
transitivity simultaneously, which are the two primary dependency patterns that are
observed social network data in real-world social networks. One particular LFM, the
additive and multiplicative effects network model (AME) accounts for the heterogeneity
of second-order dependencies at the actor level. However, all current latent
variable models have not considered the heterogeneity of third-order dependencies,
actor-level transitivity for example. Failure to model third-order dependency heterogeneity
may result in worse fits to local network structures, which in turn may result
in biased parameter inferences and may negatively influence the goodness-of-fit and
prediction performance of a model.
Motivated by such a gap in the literature, this dissertation proposes to incorporate
a correlation structure between the sender and receiver latent factors in the
AME to account for the distribution of actor-level transitivity. The proposed model
is compared with the existing AME in both simulation studies real-world data. Models
are evaluated via multiple goodness-of-fit techniques, including mean squared error,
parameter coverage rate, information criteria, receiver-operation curve (ROC)
based on K-fold cross-validation or full data, and posterior predictive checking. This
work may also contribute to the literature of goodness-of-fit methods to network
models, which is an area that has not been unified.
Both the simulation studies and real-world data analyses showed that adding
the correlation structure provides a better fit as well as higher prediction accuracy
to network data. The proposed method has equal or similar performance to the
AME when the underlying correlation is zero, with regard to mean-squared error
of probability of ties and widely applicable information criteria. The present study
did not find any significant impact of the correlation term on the node-level covariate’s
coefficient estimation. Future studies include investigating more types of covariates,
subgroup related covariate effects is an example
Assessing and accounting for correlation in RNA-seq data analysis
RNA-sequencing (RNA-seq) technology is a high-throughput next-generation sequencing procedure. It allows researchers to measure gene transcript abundance at a lower cost and with a higher resolution.
Advances in RNA-seq technology promoted new methodological development in several branches of quantitative analysis for RNA-seq data. In this dissertation, we focus on several topics related to RNA-seq data analysis.
This dissertation is comprised of three papers on the analysis of RNA-seq data. We first introduce a method for detecting differentially expressed genes across different experimental conditions with correlated RNA-seq data. We fit a general linear model to the transformed read counts of each gene and assume the error vector has a block-diagonal correlation matrix with unstructured blocks that
account for within-gene correlations. In order to stabilize parameter estimation with limited replicates, we shrink the residual maximum likelihood estimator of correlation parameters toward a mean-correlation locally-weighted scatterplot smoothing curve. The shrinkage weights are determined by using a hierarchical model and then estimated via parametric bootstrap. Due to the information sharing across genes in parameter estimation, the null distribution of test statistic is unknown and mathematically intractable. Thus, we approximate the null test distribution through a parametric bootstrap strategy.
Next, we focus on correlation estimation between genes. Gene co-expression correlation estimation is a fundamental step in gene co-expression network construction. The correlation estimates could also be used as inputs of topological statistics which help analyze gene functions. We propose a new strategy for co-expression correlation definition and estimation. We introduce a motivating dataset with two factors and a split-plot experimental design. We define two types of co-expression correlations that originate from two different sources. We apply a linear mixed model to each gene pair. The correlations within random effects and random errors are used to represent the two types of correlations.
Finally, we consider a basic topic in quantitative RNA-seq analysis, gene filtering. It is essential to remove genes with extremely low read counts before further analysis to avoid numerical problems and to get a more stable estimates. For most differential expression and gene network analyses tools, there are embedded gene filtering functions. In general, these functions rely on a user-defined hard threshold for gene selection and fail to make full use of gene features, such as gene length and GC content level. Several studies have shown that gene features have a significant impact on RNA-sequencing efficiency and thus should be considered in subsequent analysis. We propose to fit a
model involving a two-component mixture of Gaussian distribution to the transformed read counts for each sample and assume all parameters are functions of GC content. We adopt a modified semiparametric expectation-maximization algorithm for parameter estimation.
We perform a series of simulation studies and show, that in many cases, the proposed methods improve upon existing methods and are more robust
Serial correlation in dynamic panel data models with weakly exogenous regressor and fixed effects
Our paper wants to present and compare two estimation methodologies for dynamic panel data models in the presence of serially correlated errors and weakly exogenous regressors. The ¯rst is the ¯rst di®erence GMM estimator as proposed by Arellano and Bond (1991) and the second is the transformed Maximum Likelihood Estimator as proposed by Hsiao, Pesaran, and Tahmiscioglu (2002). Thereby, we consider the ¯xed e®ects case and weakly exogenous regressors. The ¯nite sample properties of both estimation methodologies are analysed within a simulation experiment. Furthermore, we will present an empirical example to consider the performance of both estimators with real data. JEL Classification: C23, J6
Estimation of Dynamic Mixed Double Factors Model in High Dimensional Panel Data
The purpose of this article is to develop the dimension reduction techniques
in panel data analysis when the number of individuals and indicators is large.
We use Principal Component Analysis (PCA) method to represent large number of
indicators by minority common factors in the factor models. We propose the
Dynamic Mixed Double Factor Model (DMDFM for short) to re ect cross section and
time series correlation with interactive factor structure. DMDFM not only
reduce the dimension of indicators but also consider the time series and cross
section mixed effect. Different from other models, mixed factor model have two
styles of common factors. The regressors factors re flect common trend and
reduce the dimension, error components factors re ect difference and weak
correlation of individuals. The results of Monte Carlo simulation show that
Generalized Method of Moments (GMM) estimators have good unbiasedness and
consistency. Simulation also shows that the DMDFM can improve prediction power
of the models effectively.Comment: 38 pages, 2 figure
Degeneracy of gravitational waveforms in the context of GW150914
We study the degeneracy of theoretical gravitational waveforms for binary
black hole mergers using an aligned-spin effective-one-body model. After
appropriate truncation, bandpassing, and matching, we identify regions in the
mass--spin parameter space containing waveforms similar to the template
proposed for GW150914, with masses and , using the cross-correlation coefficient as a measure of
the similarity between waveforms. Remarkably high cross-correlations are found
across broad regions of parameter space. The associated uncertanties exceed
these from LIGO's Bayesian analysis considerably. We have shown that waveforms
with greatly increased masses, such as and , and strong anti-aligned spins ( and )
yield almost the same signal-to-noise ratio in the strain data for GW150914.Comment: Accepted for publication in JCA
Quantum field tomography
We introduce the concept of quantum field tomography, the efficient and
reliable reconstruction of unknown quantum fields based on data of correlation
functions. At the basis of the analysis is the concept of continuous matrix
product states, a complete set of variational states grasping states in quantum
field theory. We innovate a practical method, making use of and developing
tools in estimation theory used in the context of compressed sensing such as
Prony methods and matrix pencils, allowing us to faithfully reconstruct quantum
field states based on low-order correlation functions. In the absence of a
phase reference, we highlight how specific higher order correlation functions
can still be predicted. We exemplify the functioning of the approach by
reconstructing randomised continuous matrix product states from their
correlation data and study the robustness of the reconstruction for different
noise models. We also apply the method to data generated by simulations based
on continuous matrix product states and using the time-dependent variational
principle. The presented approach is expected to open up a new window into
experimentally studying continuous quantum systems, such as encountered in
experiments with ultra-cold atoms on top of atom chips. By virtue of the
analogy with the input-output formalism in quantum optics, it also allows for
studying open quantum systems.Comment: 31 pages, 5 figures, minor change
- …