Search CORE

1,390 research outputs found

Interpolating Thin-Shell and Sharp Large-Deviation Estimates For Isotropic Log-Concave Measures

Author: Guédon Olivier
Milman Emanuel
Publication venue
Publication date: 01/01/2011
Field of study

Given an isotropic random vector

X

with log-concave density in Euclidean space \Real^n, we study the concentration properties of

|X|

on all scales, both above and below its expectation. We show in particular that: \P(\abs{|X| -\sqrt{n}} \geq t \sqrt{n}) \leq C \exp(-c n^{1/2} \min(t^3,t)) \;\;\; \forall t \geq 0 ~, for some universal constants

c,C>0

. This improves the best known deviation results on the thin-shell and mesoscopic scales due to Fleury and Klartag, respectively, and recovers the sharp large-deviation estimate of Paouris. Another new feature of our estimate is that it improves when

X

\psi_\alpha

(

\alpha \in (1,2]

), in precise agreement with Paouris' estimates. The upper bound on the thin-shell width \sqrt{\Var(|X|)} we obtain is of the order of

n^{1/3}

, and improves down to

n^{1/4}

when

X

\psi_2

. Our estimates thus continuously interpolate between a new best known thin-shell estimate and the sharp large-deviation estimate of Paouris. As a consequence, a new best known bound on the Cheeger isoperimetric constant appearing in a conjecture of Kannan--Lov\'asz--Simonovits is deduced.Comment: 29 pages - formulation is now general, estimating deviation of a linear image of X, and dependence on the \psi_\alpha constant is explicit. Corrected typos and refined explanations. To appear in GAF

arXiv.org e-Print Archive

HAL - UPEC / UPEM

A new specification of generalized linear models for categorical data

Author: Guédon Yann
Peyhardi Jean
Trottier Catherine
Publication venue
Publication date: 01/01/2014
Field of study

Regression models for categorical data are specified in heterogeneous ways. We propose to unify the specification of such models. This allows us to define the family of reference models for nominal data. We introduce the notion of reversible models for ordinal data that distinguishes adjacent and cumulative models from sequential ones. The combination of the proposed specification with the definition of reference and reversible models and various invariance properties leads to a new view of regression models for categorical data.Comment: 31 pages, 13 figure

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Agritrop

HAL-CIRAD

Partitioned conditional generalized linear models for categorical data

Author: Guédon Yann
Peyhardi Jean
Trottier Catherine
Publication venue
Publication date: 01/01/2014
Field of study

In categorical data analysis, several regression models have been proposed for hierarchically-structured response variables, e.g. the nested logit model. But they have been formally defined for only two or three levels in the hierarchy. Here, we introduce the class of partitioned conditional generalized linear models (PCGLMs) defined for any numbers of levels. The hierarchical structure of these models is fully specified by a partition tree of categories. Using the genericity of the (r,F,Z) specification, the PCGLM can handle nominal, ordinal but also partially-ordered response variables.Comment: 25 pages, 13 figure

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Agritrop

HAL-CIRAD

Thin-shell concentration for convex measures

Author: Fradelizi Matthieu
Guédon Olivier
Pajor Alain
Publication venue
Publication date: 01/01/2014
Field of study

We prove that for

s<0

s

-concave measures on

{\mathbb R}^n

satisfy a thin shell concentration similar to the log-concave one. It leads to a Berry-Esseen type estimate for their one dimensional marginal distributions. We also establish sharp reverse H\"older inequalities for

s

-concave measures

arXiv.org e-Print Archive

Crossref

HAL - UPEC / UPEM

High-Resolution Road Vehicle Collision Prediction for the City of Montreal

Author: Glatard Tristan
Guédon Timothée
Hébert Antoine
Jaumard Brigitte
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/11/2019
Field of study

Road accidents are an important issue of our modern societies, responsible for millions of deaths and injuries every year in the world. In Quebec only, in 2018, road accidents are responsible for 359 deaths and 33 thousands of injuries. In this paper, we show how one can leverage open datasets of a city like Montreal, Canada, to create high-resolution accident prediction models, using big data analytics. Compared to other studies in road accident prediction, we have a much higher prediction resolution, i.e., our models predict the occurrence of an accident within an hour, on road segments defined by intersections. Such models could be used in the context of road accident prevention, but also to identify key factors that can lead to a road accident, and consequently, help elaborate new policies. We tested various machine learning methods to deal with the severe class imbalance inherent to accident prediction problems. In particular, we implemented the Balanced Random Forest algorithm, a variant of the Random Forest machine learning algorithm in Apache Spark. Interestingly, we found that in our case, Balanced Random Forest does not perform significantly better than Random Forest. Experimental results show that 85% of road vehicle collisions are detected by our model with a false positive rate of 13%. The examples identified as positive are likely to correspond to high-risk situations. In addition, we identify the most important predictors of vehicle collisions for the area of Montreal: the count of accidents on the same road segment during previous years, the temperature, the day of the year, the hour and the visibility

arXiv.org e-Print Archive

Crossref

Parametric Modelling of Multivariate Count Data Using Probabilistic Graphical Models

Author: Durand Jean-Baptiste
Fernique Pierre
Guédon Yann
Publication venue
Publication date: 13/09/2013
Field of study

Multivariate count data are defined as the number of items of different categories issued from sampling within a population, which individuals are grouped into categories. The analysis of multivariate count data is a recurrent and crucial issue in numerous modelling problems, particularly in the fields of biology and ecology (where the data can represent, for example, children counts associated with multitype branching processes), sociology and econometrics. We focus on I) Identifying categories that appear simultaneously, or on the contrary that are mutually exclusive. This is achieved by identifying conditional independence relationships between the variables; II)Building parsimonious parametric models consistent with these relationships; III) Characterising and testing the effects of covariates on the joint distribution of the counts. To achieve these goals, we propose an approach based on graphical probabilistic models, and more specifically partially directed acyclic graphs

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

HAL-CIRAD