1,356 research outputs found

    Interpolating Thin-Shell and Sharp Large-Deviation Estimates For Isotropic Log-Concave Measures

    Full text link
    Given an isotropic random vector XX with log-concave density in Euclidean space \Real^n, we study the concentration properties of ∣X∣|X| on all scales, both above and below its expectation. We show in particular that: \P(\abs{|X| -\sqrt{n}} \geq t \sqrt{n}) \leq C \exp(-c n^{1/2} \min(t^3,t)) \;\;\; \forall t \geq 0 ~, for some universal constants c,C>0c,C>0. This improves the best known deviation results on the thin-shell and mesoscopic scales due to Fleury and Klartag, respectively, and recovers the sharp large-deviation estimate of Paouris. Another new feature of our estimate is that it improves when XX is ψα\psi_\alpha (α∈(1,2]\alpha \in (1,2]), in precise agreement with Paouris' estimates. The upper bound on the thin-shell width \sqrt{\Var(|X|)} we obtain is of the order of n1/3n^{1/3}, and improves down to n1/4n^{1/4} when XX is ψ2\psi_2. Our estimates thus continuously interpolate between a new best known thin-shell estimate and the sharp large-deviation estimate of Paouris. As a consequence, a new best known bound on the Cheeger isoperimetric constant appearing in a conjecture of Kannan--Lov\'asz--Simonovits is deduced.Comment: 29 pages - formulation is now general, estimating deviation of a linear image of X, and dependence on the \psi_\alpha constant is explicit. Corrected typos and refined explanations. To appear in GAF

    A new specification of generalized linear models for categorical data

    Full text link
    Regression models for categorical data are specified in heterogeneous ways. We propose to unify the specification of such models. This allows us to define the family of reference models for nominal data. We introduce the notion of reversible models for ordinal data that distinguishes adjacent and cumulative models from sequential ones. The combination of the proposed specification with the definition of reference and reversible models and various invariance properties leads to a new view of regression models for categorical data.Comment: 31 pages, 13 figure

    Partitioned conditional generalized linear models for categorical data

    Get PDF
    In categorical data analysis, several regression models have been proposed for hierarchically-structured response variables, e.g. the nested logit model. But they have been formally defined for only two or three levels in the hierarchy. Here, we introduce the class of partitioned conditional generalized linear models (PCGLMs) defined for any numbers of levels. The hierarchical structure of these models is fully specified by a partition tree of categories. Using the genericity of the (r,F,Z) specification, the PCGLM can handle nominal, ordinal but also partially-ordered response variables.Comment: 25 pages, 13 figure

    Thin-shell concentration for convex measures

    Full text link
    We prove that for s<0s<0, ss-concave measures on Rn{\mathbb R}^n satisfy a thin shell concentration similar to the log-concave one. It leads to a Berry-Esseen type estimate for their one dimensional marginal distributions. We also establish sharp reverse H\"older inequalities for ss-concave measures

    High-Resolution Road Vehicle Collision Prediction for the City of Montreal

    Full text link
    Road accidents are an important issue of our modern societies, responsible for millions of deaths and injuries every year in the world. In Quebec only, in 2018, road accidents are responsible for 359 deaths and 33 thousands of injuries. In this paper, we show how one can leverage open datasets of a city like Montreal, Canada, to create high-resolution accident prediction models, using big data analytics. Compared to other studies in road accident prediction, we have a much higher prediction resolution, i.e., our models predict the occurrence of an accident within an hour, on road segments defined by intersections. Such models could be used in the context of road accident prevention, but also to identify key factors that can lead to a road accident, and consequently, help elaborate new policies. We tested various machine learning methods to deal with the severe class imbalance inherent to accident prediction problems. In particular, we implemented the Balanced Random Forest algorithm, a variant of the Random Forest machine learning algorithm in Apache Spark. Interestingly, we found that in our case, Balanced Random Forest does not perform significantly better than Random Forest. Experimental results show that 85% of road vehicle collisions are detected by our model with a false positive rate of 13%. The examples identified as positive are likely to correspond to high-risk situations. In addition, we identify the most important predictors of vehicle collisions for the area of Montreal: the count of accidents on the same road segment during previous years, the temperature, the day of the year, the hour and the visibility

    Parametric Modelling of Multivariate Count Data Using Probabilistic Graphical Models

    Get PDF
    Multivariate count data are defined as the number of items of different categories issued from sampling within a population, which individuals are grouped into categories. The analysis of multivariate count data is a recurrent and crucial issue in numerous modelling problems, particularly in the fields of biology and ecology (where the data can represent, for example, children counts associated with multitype branching processes), sociology and econometrics. We focus on I) Identifying categories that appear simultaneously, or on the contrary that are mutually exclusive. This is achieved by identifying conditional independence relationships between the variables; II)Building parsimonious parametric models consistent with these relationships; III) Characterising and testing the effects of covariates on the joint distribution of the counts. To achieve these goals, we propose an approach based on graphical probabilistic models, and more specifically partially directed acyclic graphs
    • 

    corecore