651 research outputs found
A Distribution Free Method for General Risk Problems
In practical applications of the collective theory of risk one is very often confronted with the problem of making some kind of assumptions about the form of the distribution functions underlying the frequency as well as the severity of claims. Lundberg's [6] and Cramér's [3] approach are essentially based upon the hypothesis that the number of claims occurring in a certain period obey the Poisson distribution whereas for the conditional distribution of the amount claimed upon occurrence of such a claim the exponential distribution is very often used. Of course, by weighting the Poisson distributions (as e.g. done by Ammeter [1]) one enlarges the class of "frequency of claims” distributions considerably but nevertheless there remains an uneasy feeling about artificial assumptions, which are just made for mathematical convenience but are not necessarily related to the practical problems to which the theory of risk is applied. It seems to me that, before applying the general model of the theory of risk, one should always ask the question: "How much information do we want from the mathematical model which describes the risk process?” The answer will be that in many practical cases it is sufficient to determine the mean and the variance of this process. Let me only mention the rate making, the experience control, the refund problems and the detection of secular trends in a certain risk category. In all these cases the practical solutions seem to be sufficiently determined by mean and variance. Let us therefore attack the problem of determining mean and variance of the risk process while trying to make as few assumptions as possible about the type of the underlying probability distributions. This approach is not original. De Finetti [5] has already proposed an approach to risk theory only based upon the knowledge of mean and variance. It is along his lines of thought, although in different mathematical form, that I wish to procee
Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm
We consider variable selection in high-dimensional linear models where the
number of covariates greatly exceeds the sample size. We introduce the new
concept of partial faithfulness and use it to infer associations between the
covariates and the response. Under partial faithfulness, we develop a
simplified version of the PC algorithm (Spirtes et al., 2000), the PC-simple
algorithm, which is computationally feasible even with thousands of covariates
and provides consistent variable selection under conditions on the random
design matrix that are of a different nature than coherence conditions for
penalty-based approaches like the Lasso. Simulations and application to real
data show that our method is competitive compared to penalty-based approaches.
We provide an efficient implementation of the algorithm in the R-package pcalg.Comment: 20 pages, 3 figure
Credibility in the Regression case Revisited (A Late Tribute to Charles A. Hachemeister)
Many authors have observed that Hachemeisters Regression Model for Credibility - if applied to simple linear regression - leads to unsatisfactory credibility matrices: they typically ‘mix up' the regression parameters and in particular lead to regression lines that seem ‘out of range' compared with both individual and collective regression lines. We propose to amend these shortcomings by an appropriate definition of the regression parameters: -intercept -slope Contrary to standard practice the intercept should however not be defined as the value at time zero but as the value of the regression line at the barycenter of time. With these definitions regression parameters which are uncorrected in the collective can be estimated separately by standard one dimensional credibility techniques. A similar convenient reparametrization can also be achieved in the general regression case. The good choice for the regression parameters is such as to turn the design matrix into an array with orthogonal column
Distributional Equivalence and Structure Learning for Bow-free Acyclic Path Diagrams
We consider the problem of structure learning for bow-free acyclic path
diagrams (BAPs). BAPs can be viewed as a generalization of linear Gaussian DAG
models that allow for certain hidden variables. We present a first method for
this problem using a greedy score-based search algorithm. We also prove some
necessary and some sufficient conditions for distributional equivalence of BAPs
which are used in an algorithmic ap- proach to compute (nearly) equivalent
model structures. This allows us to infer lower bounds of causal effects. We
also present applications to real and simulated datasets using our publicly
available R-package
Toward a unified theory of sparse dimensionality reduction in Euclidean space
Let be a sparse Johnson-Lindenstrauss
transform [KN14] with non-zeroes per column. For a subset of the unit
sphere, given, we study settings for required to
ensure i.e. so that preserves the norm of every
simultaneously and multiplicatively up to . We
introduce a new complexity parameter, which depends on the geometry of , and
show that it suffices to choose and such that this parameter is small.
Our result is a sparse analog of Gordon's theorem, which was concerned with a
dense having i.i.d. Gaussian entries. We qualitatively unify several
results related to the Johnson-Lindenstrauss lemma, subspace embeddings, and
Fourier-based restricted isometries. Our work also implies new results in using
the sparse Johnson-Lindenstrauss transform in numerical linear algebra,
classical and model-based compressed sensing, manifold learning, and
constrained least squares problems such as the Lasso
Further developments in the Erlang(n) risk process
For actuarial aplications, we consider the Sparre–Andersen risk model when the interclaim times are Erlang(n) distributed. We first address the problem of solving an integro-differential equation that is satisfied by the survival probability and other probabilities, and show an alternative and improved method to solve such equations to that presented by Li (2008).
This is done by considering the roots with positive real parts of the generalized Lundberg’s equation, and establishing a one–one relation between them and the solutions of the integro-differential equation mentioned before.
Afterwards, we apply our findings above in the computation of the distribution of the maximum severity of ruin. This computation depends on the non-ruin probability and on the roots of the fundamental Lundberg’s equation.
We illustrate and give explicit formulae for Erlang(3) interclaim arrivals with exponentially distributed single claim amounts and Erlang(2) interclaim times with Erlang(2) claim amounts.
Finally, considering an interest force, we consider the problem of calculating the expected discounted dividends prior to ruin, finding an integro-differential equation that they satisfy and solving it. Numerical examples are also provided for illustration
Causal Inference Using Graphical Models with the R Package pcalg
The pcalg package for R can be used for the following two purposes: Causal structure learning and estimation of causal effects from observational data. In this document, we give a brief overview of the methodology, and demonstrate the package’s functionality in both toy examples and applications
Causal stability ranking
Genotypic causes of a phenotypic trait are typically determined via randomized controlled intervention experiments. Such experiments are often prohibitive with respect to durations and costs, and informative prioritization of experiments is desirable. We therefore consider predicting stable rankings of genes (covariates), according to their total causal effects on a phenotype (response), from observational data. Since causal effects are generally non-identifiable from observational data only, we use a method that can infer lower bounds for the total causal effect under some assumptions. We validated our method, which we call Causal Stability Ranking (CStaR), in two situations. First, we performed knock-out experiments with Arabidopsis thaliana according to a predicted ranking based on observational gene expression data, using flowering time as phenotype of interest. Besides several known regulators of flowering time, we found almost half of the tested top ranking mutants to have a significantly changed flowering time. Second, we compared CStaR to established regression-based methods on a gene expression dataset of Saccharomyces cerevisiae. We found that CStaR outperforms these established methods. Our method allows for efficient design and prioritization of future intervention experiments, and due to its generality it can be used for a broad spectrum of applications. Availability: The full table of ranked genes, all raw data and an example R script for CStaR are available from the Bioinformatics website. Contact: [email protected] Supplementary Information: Supplementary data are available at Bioinformatics onlin
Handwritten digit recognition by bio-inspired hierarchical networks
The human brain processes information showing learning and prediction
abilities but the underlying neuronal mechanisms still remain unknown.
Recently, many studies prove that neuronal networks are able of both
generalizations and associations of sensory inputs. In this paper, following a
set of neurophysiological evidences, we propose a learning framework with a
strong biological plausibility that mimics prominent functions of cortical
circuitries. We developed the Inductive Conceptual Network (ICN), that is a
hierarchical bio-inspired network, able to learn invariant patterns by
Variable-order Markov Models implemented in its nodes. The outputs of the
top-most node of ICN hierarchy, representing the highest input generalization,
allow for automatic classification of inputs. We found that the ICN clusterized
MNIST images with an error of 5.73% and USPS images with an error of 12.56%
- …