181 research outputs found
Likelihood Inference for Models with Unobservables: Another View
There have been controversies among statisticians on (i) what to model and
(ii) how to make inferences from models with unobservables. One such
controversy concerns the difference between estimation methods for the marginal
means not necessarily having a probabilistic basis and statistical models
having unobservables with a probabilistic basis. Another concerns
likelihood-based inference for statistical models with unobservables. This
needs an extended-likelihood framework, and we show how one such extension,
hierarchical likelihood, allows this to be done. Modeling of unobservables
leads to rich classes of new probabilistic models from which likelihood-type
inferences can be made naturally with hierarchical likelihood.Comment: This paper discussed in: [arXiv:1010.0804], [arXiv:1010.0807],
[arXiv:1010.0810]. Rejoinder at [arXiv:1010.0814]. Published in at
http://dx.doi.org/10.1214/09-STS277 the Statistical Science
(http://www.imstat.org/sts/) by the Institute of Mathematical Statistics
(http://www.imstat.org
Resolving the induction problem: Can we state with complete confidence via induction that the sun rises forever?
Induction is a form of reasoning from the particular example to the general
rule. However, establishing the truth of a general proposition is problematic,
because it is always possible that a conflicting observation to occur. This
problem is known as the induction problem. The sunrise problem is a
quintessential example of the induction problem, which was first introduced by
Laplace (1814). However, in Laplace's solution, a zero probability was assigned
to the proposition that the sun will rise forever, regardless of the number of
observations made. Therefore, it has often been stated that complete confidence
regarding a general proposition can never be attained via induction. In this
study, we attempted to overcome this skepticism by using a recently developed
theoretically consistent procedure. The findings demonstrate that through
induction, one can rationally gain complete confidence in propositions based on
scientific theory
Estimation of multivariate normal mean and its application to mixed linear models
Let X = (x(,1),x(,2),...,x(,p))\u27 be a multivariate normal random variable with mean vector, (theta), in a space (THETA), and variance matrix I;From Strawderman\u27s (1971) class of estimators, we derive a minimax admissible estimator for (theta). It has a relatively simple form when p is greater than or equal to five. We also extend Stein\u27s (1973) technique to evaluate unbiased estimators of risks for discontinuous estimators. Then, we show the exact risks of a preliminary test estimator and of compromised or mixture estimators. We develop estimators that shrink towards some subspace of (THETA) and show the relationship between shrinkage functionals and variance component estimators in balanced mixed linear models. We also investigate the asymptotic behavior of shrinkage estimators. By choosing an appropriate subspace, we show that our estimator and ridge regression estimators achieve stability of prediction in a particular data example;References;Strawderman, W. E. 1971. Proper Bayes Minimax Estimators of the Multivariate Normal Mean. The Annals of Mathematical Statistics 42:385-388. Stein, C. 1973. Estimation of the Mean of a Multivariate Distribution Proceedings of the Prague Symposium on Asymptotic Statistics:345-387
μ-Oxido-bis[bis(pentafluorophenolato)(η5-pentamethylcyclopentadienyl)titanium(IV)]
The dinuclear title complex, [Ti2(C10H15)2(C6F5O)4O], features two TiIV atoms bridged by an O atom, which lies on an inversion centre. The TiIV atom is bonded to a η5-pentamethylcyclopentadienyl ring, two pentafluorophenolate anions and to the bridging O atom. The environment around the TiIV atom can be considered as a distorted tetrahedron. The cyclopentadienyl ring is disordered over two sets of sites [site occupancy = 0.824 (8) for the major component]
Deep Neural Networks for Semiparametric Frailty Models via H-likelihood
For prediction of clustered time-to-event data, we propose a new deep neural
network based gamma frailty model (DNN-FM). An advantage of the proposed model
is that the joint maximization of the new h-likelihood provides maximum
likelihood estimators for fixed parameters and best unbiased predictors for
random frailties. Thus, the proposed DNN-FM is trained by using a negative
profiled h-likelihood as a loss function, constructed by profiling out the
non-parametric baseline hazard. Experimental studies show that the proposed
method enhances the prediction performance of the existing methods. A real data
analysis shows that the inclusion of subject-specific frailties helps to
improve prediction of the DNN based Cox model (DNN-Cox)
Super-sparse principal component analyses for high-throughput genomic data
<p>Abstract</p> <p>Background</p> <p>Principal component analysis (PCA) has gained popularity as a method for the analysis of high-dimensional genomic data. However, it is often difficult to interpret the results because the principal components are linear combinations of all variables, and the coefficients (loadings) are typically nonzero. These nonzero values also reflect poor estimation of the true vector loadings; for example, for gene expression data, biologically we expect only a portion of the genes to be expressed in any tissue, and an even smaller fraction to be involved in a particular process. Sparse PCA methods have recently been introduced for reducing the number of nonzero coefficients, but these existing methods are not satisfactory for high-dimensional data applications because they still give too many nonzero coefficients.</p> <p>Results</p> <p>Here we propose a new PCA method that uses two innovations to produce an extremely sparse loading vector: (i) a random-effect model on the loadings that leads to an unbounded penalty at the origin and (ii) shrinkage of the singular values obtained from the singular value decomposition of the data matrix. We develop a stable computing algorithm by modifying nonlinear iterative partial least square (NIPALS) algorithm, and illustrate the method with an analysis of the NCI cancer dataset that contains 21,225 genes.</p> <p>Conclusions</p> <p>The new method has better performance than several existing methods, particularly in the estimation of the loading vectors.</p
- …