651 research outputs found

    A Distribution Free Method for General Risk Problems

    Get PDF
    In practical applications of the collective theory of risk one is very often confronted with the problem of making some kind of assumptions about the form of the distribution functions underlying the frequency as well as the severity of claims. Lundberg's [6] and Cramér's [3] approach are essentially based upon the hypothesis that the number of claims occurring in a certain period obey the Poisson distribution whereas for the conditional distribution of the amount claimed upon occurrence of such a claim the exponential distribution is very often used. Of course, by weighting the Poisson distributions (as e.g. done by Ammeter [1]) one enlarges the class of "frequency of claims” distributions considerably but nevertheless there remains an uneasy feeling about artificial assumptions, which are just made for mathematical convenience but are not necessarily related to the practical problems to which the theory of risk is applied. It seems to me that, before applying the general model of the theory of risk, one should always ask the question: "How much information do we want from the mathematical model which describes the risk process?” The answer will be that in many practical cases it is sufficient to determine the mean and the variance of this process. Let me only mention the rate making, the experience control, the refund problems and the detection of secular trends in a certain risk category. In all these cases the practical solutions seem to be sufficiently determined by mean and variance. Let us therefore attack the problem of determining mean and variance of the risk process while trying to make as few assumptions as possible about the type of the underlying probability distributions. This approach is not original. De Finetti [5] has already proposed an approach to risk theory only based upon the knowledge of mean and variance. It is along his lines of thought, although in different mathematical form, that I wish to procee

    Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm

    Full text link
    We consider variable selection in high-dimensional linear models where the number of covariates greatly exceeds the sample size. We introduce the new concept of partial faithfulness and use it to infer associations between the covariates and the response. Under partial faithfulness, we develop a simplified version of the PC algorithm (Spirtes et al., 2000), the PC-simple algorithm, which is computationally feasible even with thousands of covariates and provides consistent variable selection under conditions on the random design matrix that are of a different nature than coherence conditions for penalty-based approaches like the Lasso. Simulations and application to real data show that our method is competitive compared to penalty-based approaches. We provide an efficient implementation of the algorithm in the R-package pcalg.Comment: 20 pages, 3 figure

    Credibility in the Regression case Revisited (A Late Tribute to Charles A. Hachemeister)

    Get PDF
    Many authors have observed that Hachemeisters Regression Model for Credibility - if applied to simple linear regression - leads to unsatisfactory credibility matrices: they typically ‘mix up' the regression parameters and in particular lead to regression lines that seem ‘out of range' compared with both individual and collective regression lines. We propose to amend these shortcomings by an appropriate definition of the regression parameters: -intercept -slope Contrary to standard practice the intercept should however not be defined as the value at time zero but as the value of the regression line at the barycenter of time. With these definitions regression parameters which are uncorrected in the collective can be estimated separately by standard one dimensional credibility techniques. A similar convenient reparametrization can also be achieved in the general regression case. The good choice for the regression parameters is such as to turn the design matrix into an array with orthogonal column

    Distributional Equivalence and Structure Learning for Bow-free Acyclic Path Diagrams

    Full text link
    We consider the problem of structure learning for bow-free acyclic path diagrams (BAPs). BAPs can be viewed as a generalization of linear Gaussian DAG models that allow for certain hidden variables. We present a first method for this problem using a greedy score-based search algorithm. We also prove some necessary and some sufficient conditions for distributional equivalence of BAPs which are used in an algorithmic ap- proach to compute (nearly) equivalent model structures. This allows us to infer lower bounds of causal effects. We also present applications to real and simulated datasets using our publicly available R-package

    Toward a unified theory of sparse dimensionality reduction in Euclidean space

    Get PDF
    Let ΦRm×n\Phi\in\mathbb{R}^{m\times n} be a sparse Johnson-Lindenstrauss transform [KN14] with ss non-zeroes per column. For a subset TT of the unit sphere, ε(0,1/2)\varepsilon\in(0,1/2) given, we study settings for m,sm,s required to ensure EΦsupxTΦx221<ε, \mathop{\mathbb{E}}_\Phi \sup_{x\in T} \left|\|\Phi x\|_2^2 - 1 \right| < \varepsilon , i.e. so that Φ\Phi preserves the norm of every xTx\in T simultaneously and multiplicatively up to 1+ε1+\varepsilon. We introduce a new complexity parameter, which depends on the geometry of TT, and show that it suffices to choose ss and mm such that this parameter is small. Our result is a sparse analog of Gordon's theorem, which was concerned with a dense Φ\Phi having i.i.d. Gaussian entries. We qualitatively unify several results related to the Johnson-Lindenstrauss lemma, subspace embeddings, and Fourier-based restricted isometries. Our work also implies new results in using the sparse Johnson-Lindenstrauss transform in numerical linear algebra, classical and model-based compressed sensing, manifold learning, and constrained least squares problems such as the Lasso

    Further developments in the Erlang(n) risk process

    Get PDF
    For actuarial aplications, we consider the Sparre–Andersen risk model when the interclaim times are Erlang(n) distributed. We first address the problem of solving an integro-differential equation that is satisfied by the survival probability and other probabilities, and show an alternative and improved method to solve such equations to that presented by Li (2008). This is done by considering the roots with positive real parts of the generalized Lundberg’s equation, and establishing a one–one relation between them and the solutions of the integro-differential equation mentioned before. Afterwards, we apply our findings above in the computation of the distribution of the maximum severity of ruin. This computation depends on the non-ruin probability and on the roots of the fundamental Lundberg’s equation. We illustrate and give explicit formulae for Erlang(3) interclaim arrivals with exponentially distributed single claim amounts and Erlang(2) interclaim times with Erlang(2) claim amounts. Finally, considering an interest force, we consider the problem of calculating the expected discounted dividends prior to ruin, finding an integro-differential equation that they satisfy and solving it. Numerical examples are also provided for illustration

    Causal Inference Using Graphical Models with the R Package pcalg

    Get PDF
    The pcalg package for R can be used for the following two purposes: Causal structure learning and estimation of causal effects from observational data. In this document, we give a brief overview of the methodology, and demonstrate the package’s functionality in both toy examples and applications

    Causal stability ranking

    Get PDF
    Genotypic causes of a phenotypic trait are typically determined via randomized controlled intervention experiments. Such experiments are often prohibitive with respect to durations and costs, and informative prioritization of experiments is desirable. We therefore consider predicting stable rankings of genes (covariates), according to their total causal effects on a phenotype (response), from observational data. Since causal effects are generally non-identifiable from observational data only, we use a method that can infer lower bounds for the total causal effect under some assumptions. We validated our method, which we call Causal Stability Ranking (CStaR), in two situations. First, we performed knock-out experiments with Arabidopsis thaliana according to a predicted ranking based on observational gene expression data, using flowering time as phenotype of interest. Besides several known regulators of flowering time, we found almost half of the tested top ranking mutants to have a significantly changed flowering time. Second, we compared CStaR to established regression-based methods on a gene expression dataset of Saccharomyces cerevisiae. We found that CStaR outperforms these established methods. Our method allows for efficient design and prioritization of future intervention experiments, and due to its generality it can be used for a broad spectrum of applications. Availability: The full table of ranked genes, all raw data and an example R script for CStaR are available from the Bioinformatics website. Contact: [email protected] Supplementary Information: Supplementary data are available at Bioinformatics onlin

    Handwritten digit recognition by bio-inspired hierarchical networks

    Full text link
    The human brain processes information showing learning and prediction abilities but the underlying neuronal mechanisms still remain unknown. Recently, many studies prove that neuronal networks are able of both generalizations and associations of sensory inputs. In this paper, following a set of neurophysiological evidences, we propose a learning framework with a strong biological plausibility that mimics prominent functions of cortical circuitries. We developed the Inductive Conceptual Network (ICN), that is a hierarchical bio-inspired network, able to learn invariant patterns by Variable-order Markov Models implemented in its nodes. The outputs of the top-most node of ICN hierarchy, representing the highest input generalization, allow for automatic classification of inputs. We found that the ICN clusterized MNIST images with an error of 5.73% and USPS images with an error of 12.56%
    corecore