95 research outputs found
Rediscovering a little known fact about the t-test and the F-test: Algebraic, Geometric, Distributional and Graphical Considerations
We discuss the role that the null hypothesis should play in the construction
of a test statistic used to make a decision about that hypothesis. To construct
the test statistic for a point null hypothesis about a binomial proportion, a
common recommendation is to act as if the null hypothesis is true. We argue
that, on the surface, the one-sample t-test of a point null hypothesis about a
Gaussian population mean does not appear to follow the recommendation. We show
how simple algebraic manipulations of the usual t-statistic lead to an
equivalent test procedure consistent with the recommendation. We provide
geometric intuition regarding this equivalence and we consider extensions to
testing nested hypotheses in Gaussian linear models. We discuss an application
to graphical residual diagnostics where the form of the test statistic makes a
practical difference. By examining the formulation of the test statistic from
multiple perspectives in this familiar example, we provide simple, concrete
illustrations of some important issues that can guide the formulation of
effective solutions to more complex statistical problems.Comment: 22 pages, 5 figure
On the variations of the principal eigenvalue with respect to a parameter in growth-fragmentation models
We study the variations of the principal eigenvalue associated to a
growth-fragmentation-death equation with respect to a parameter acting on
growth and fragmentation. To this aim, we use the probabilistic
individual-based interpretation of the model. We study the variations of the
survival probability of the stochastic model, using a generation by generation
approach. Then, making use of the link between the survival probability and the
principal eigenvalue established in a previous work, we deduce the variations
of the eigenvalue with respect to the parameter of the model
Generalised Reichenbachian common cause systems
The principle of the common cause claims that if an improbable coincidence has occurred, there must exist a common cause. This is generally taken to mean that positive correlations between non-causally related events should disappear when conditioning on the action of some underlying common cause. The extended interpretation of the principle, by contrast, urges that common causes should be called for in order to explain positive deviations between the estimated correlation of two events and the expected value of their correlation. The aim of this paper is to provide the extended reading of the principle with a general probabilistic model, capturing the simultaneous action of a system of multiple common causes. To this end, two distinct models are elaborated, and the necessary and sufficient conditions for their existence are determined
Marginal AMP Chain Graphs
We present a new family of models that is based on graphs that may have
undirected, directed and bidirected edges. We name these new models marginal
AMP (MAMP) chain graphs because each of them is Markov equivalent to some AMP
chain graph under marginalization of some of its nodes. However, MAMP chain
graphs do not only subsume AMP chain graphs but also multivariate regression
chain graphs. We describe global and pairwise Markov properties for MAMP chain
graphs and prove their equivalence for compositional graphoids. We also
characterize when two MAMP chain graphs are Markov equivalent.
For Gaussian probability distributions, we also show that every MAMP chain
graph is Markov equivalent to some directed and acyclic graph with
deterministic nodes under marginalization and conditioning on some of its
nodes. This is important because it implies that the independence model
represented by a MAMP chain graph can be accounted for by some data generating
process that is partially observed and has selection bias. Finally, we modify
MAMP chain graphs so that they are closed under marginalization for Gaussian
probability distributions. This is a desirable feature because it guarantees
parsimonious models under marginalization.Comment: Changes from v1 to v2: Discussion section got extended. Changes from
v2 to v3: New Sections 3 and 5. Changes from v3 to v4: Example 4 added to
discussion section. Changes from v4 to v5: None. Changes from v5 to v6: Some
minor and major errors have been corrected. The latter include the
definitions of descending route and pairwise separation base, and the proofs
of Theorems 5 and
Randomizing world trade. I. A binary network analysis
The international trade network (ITN) has received renewed multidisciplinary interest due to recent advances in network theory. However, it is still unclear whether a network approach conveys additional, nontrivial information with respect to traditional international-economics analyses that describe world trade only in terms of local (first-order) properties. In this and in a companion paper, we employ a recently proposed randomization method to assess in detail the role that local properties have in shaping higher-order patterns of the ITN in all its possible representations (binary or weighted, directed or undirected, aggregated or disaggregated by commodity) and across several years. Here we show that, remarkably, the properties of all binary projections of the network can be completely traced back to the degree sequence, which is therefore maximally informative. Our results imply that explaining the observed degree sequence of the ITN, which has not received particular attention in economic theory, should instead become one the main focuses of models of trade
ceylon: An R package for plotting the maps of Sri Lanka
The rapid evolution in the fields of computer science, data science, and
artificial intelligence has significantly transformed the utilisation of data
for decision-making. Data visualisation plays a critical role in any work that
involves data. Visualising data on maps is frequently encountered in many
fields. Visualising data on maps not only transforms raw data into visually
comprehensible representations but also converts complex spatial information
into simple, understandable form. Locating the data files necessary for map
creation can be a challenging task. Establishing a centralised repository can
alleviate the challenging task of finding shape files, allowing users to
efficiently discover geographic data. The ceylon R package is designed to make
simple feature data related to Sri Lanka's administrative boundaries and rivers
and streams accessible for a diverse range of R users. With straightforward
functionalities, this package allows users to quickly plot and explore
administrative boundaries and rivers and streams in Sri Lanka.Comment:
The loss value of multilinear regression
A formula for the euclidean distance between a point and a linear subspace is
presented. As a consequence a formula for determinants of positive
semidefinite, hermitian matrices is derived, and a formula for the loss value
of multilinear regression.Comment: 3 pages. arXiv admin note: text overlap with arXiv:1408.592
Discussion of ‘Nonparametric generalized fiducial inference for survival functions under censoring’
The following discussion is inspired by the paper Nonparametric generalized fiducial inference for survival functions under censoring by Cui and Hannig. The discussion consists of comments on the results, but also indicates it’s importance more generally in the context of fiducial inference. A two page introduction to fiducial inference is given to provide a context.acceptedVersionLocked until 12.8.2020 due to copyright restrictions. This is a pre-copyedited, author-produced version of an article accepted for publication in [Biometrika] following peer review. The version of record is available online at: https://doi.org/10.1093/biomet/asz02
- …
