720 research outputs found

    On the conditions used to prove oracle results for the Lasso

    Full text link
    Oracle inequalities and variable selection properties for the Lasso in linear models have been established under a variety of different assumptions on the design matrix. We show in this paper how the different conditions and concepts relate to each other. The restricted eigenvalue condition (Bickel et al., 2009) or the slightly weaker compatibility condition (van de Geer, 2007) are sufficient for oracle results. We argue that both these conditions allow for a fairly general class of design matrices. Hence, optimality of the Lasso for prediction and estimation holds for more general situations than what it appears from coherence (Bunea et al, 2007b,c) or restricted isometry (Candes and Tao, 2005) assumptions.Comment: 33 pages, 1 figur

    Credibility in the Regression case Revisited (A Late Tribute to Charles A. Hachemeister)

    Get PDF
    Many authors have observed that Hachemeisters Regression Model for Credibility - if applied to simple linear regression - leads to unsatisfactory credibility matrices: they typically ‘mix up' the regression parameters and in particular lead to regression lines that seem ‘out of range' compared with both individual and collective regression lines. We propose to amend these shortcomings by an appropriate definition of the regression parameters: -intercept -slope Contrary to standard practice the intercept should however not be defined as the value at time zero but as the value of the regression line at the barycenter of time. With these definitions regression parameters which are uncorrected in the collective can be estimated separately by standard one dimensional credibility techniques. A similar convenient reparametrization can also be achieved in the general regression case. The good choice for the regression parameters is such as to turn the design matrix into an array with orthogonal column

    Mathematics, Statistics and Data Science

    Get PDF
    The process of extracting information from data has a long history (see, for example, [1]) stretching back over centuries. Because of the proliferation of data over the last few decades, and projections for its continued proliferation over coming decades, the term Data Science has emerged to describe the substantial current intellectual effort around research with the same overall goal, namely that of extracting information. The type of data currently available in all sorts of application domains is often massive in size, very heterogeneous and far from being collected under designed or controlled experimental conditions. Nonetheless, it contains information, often substantial information, and data science requires new interdisciplinary approaches to make maximal use of this information. Data alone is typically not that informative and (machine) learning from data needs conceptual frameworks. Mathematics and statistics are crucial for providing such conceptual frameworks. The frameworks enhance the understanding of fundamental phenomena, highlight limitations and provide a formalism for properly founded data analysis, information extraction and quantification of uncertainty, as well as for the analysis and development of algorithms that carry out these key tasks. In this personal commentary on data science and its relations to mathematics and statistics, we highlight three important aspects of the emerging field: Models, High-Dimensionality and Heterogeneity, and then conclude with a brief discussion of where the field is now and implications for the mathematical sciences

    Internationalisation des élites académiques suisses au 20ème siècle : convergences et contrastes

    Get PDF
    A partir d'une base de données originale sur les professeurs de droit et de sciences économiques des universités suisses sur l'ensemble du XXe siècle, cet article rend compte des diverses dynamiques d'internationalisation de ces élites. Trois enseignements majeurs peuvent être tirés de nos analyses. D'abord, d'un point de vue diachronique, il est possible de diviser le XXe siècle en trois phases historiques : une internationalité forte des élites académiques au début du siècle, une nationalisation ou « relocalisation » suite à la Première Guerre mondiale, puis une « ré-internationalisation » à partir des années 1960 et de manière accélérée depuis les années 1980. Ensuite, les professeurs de sciences économiques, en terme de nationalités ou de lieu de formation, sont plus cosmopolites et ont moins d'ancrage local que leurs homologues juristes. Enfin, la prédominance germanique parmi les professeurs des universités suisses au début du siècle, qui s'explique autant par une internationalité d'« excellence » que de « proximité », laisse place, surtout en sciences économiques, à une montée de l'influence des Etats-Unis, révélatrice d'un effritement de l'internationalité de « proximité »

    De la sociologie de l'innovation à l'imagination sociologique : la théorie des champs à l'épreuve de la profession infirmière

    Get PDF
    Si l'on suit la perspective de Saussure pour qui le point de vue crée l'objet, la « nouveauté » d'un objet sociologique suppose au moins autant le renouvellement de son approche que la « nouveauté intrinsèque » de celui-ci. Dans cet article, nous privilégions la voie d'un tel renouvellement en mettant à l'épreuve un « vieil objet » par une « approche ancienne ». Nous inscrivant à contre-courant d'une tendance à la parcellisation de la discipline en sociologies thématiques, nous montrons la valeur heuristique qu'il y a à saisir une profession comme un espace social de positions différenciées qui ne prend sens qu'une fois réinscrit dans le champ au sein duquel il s'insère. Largement féminisée et partiellement dominée, la profession infirmière est soumise à une théorie traditionnellement mobilisée pour l'étude de groupes masculins et dominants : la théorie des champs de Pierre Bourdieu

    Pivotal estimation in high-dimensional regression via linear programming

    Full text link
    We propose a new method of estimation in high-dimensional linear regression model. It allows for very weak distributional assumptions including heteroscedasticity, and does not require the knowledge of the variance of random errors. The method is based on linear programming only, so that its numerical implementation is faster than for previously known techniques using conic programs, and it allows one to deal with higher dimensional models. We provide upper bounds for estimation and prediction errors of the proposed estimator showing that it achieves the same rate as in the more restrictive situation of fixed design and i.i.d. Gaussian errors with known variance. Following Gautier and Tsybakov (2011), we obtain the results under weaker sensitivity assumptions than the restricted eigenvalue or assimilated conditions

    Transformation des élites en Suisse

    Get PDF

    Mathematics, Statistics and Data Science

    Get PDF
    The process of extracting information from data has a long history (see, for example, [1]) stretching back over centuries. Because of the proliferation of data over the last few decades, and projections for its continued proliferation over coming decades, the term Data Science has emerged to describe the substantial current intellectual effort around research with the same overall goal, namely that of extracting information. The type of data currently available in all sorts of application domains is often massive in size, very heterogeneous and far from being collected under designed or controlled experimental conditions. Nonetheless, it contains information, often substantial information, and data science requires new interdisciplinary approaches to make maximal use of this information. Data alone is typically not that informative and (machine) learning from data needs conceptual frameworks. Mathematics and statistics are crucial for providing such conceptual frameworks. The frameworks enhance the understanding of fundamental phenomena, highlight limitations and provide a formalism for properly founded data analysis, information extraction and quantification of uncertainty, as well as for the analysis and development of algorithms that carry out these key tasks. In this personal commentary on data science and its relations to mathematics and statistics, we highlight three important aspects of the emerging field: Models, High-Dimensionality and Heterogeneity, and then conclude with a brief discussion of where the field is now and implications for the mathematical sciences

    Context Tree Selection: A Unifying View

    Get PDF
    The present paper investigates non-asymptotic properties of two popular procedures of context tree (or Variable Length Markov Chains) estimation: Rissanen's algorithm Context and the Penalized Maximum Likelihood criterion. First showing how they are related, we prove finite horizon bounds for the probability of over- and under-estimation. Concerning overestimation, no boundedness or loss-of-memory conditions are required: the proof relies on new deviation inequalities for empirical probabilities of independent interest. The underestimation properties rely on loss-of-memory and separation conditions of the process. These results improve and generalize the bounds obtained previously. Context tree models have been introduced by Rissanen as a parsimonious generalization of Markov models. Since then, they have been widely used in applied probability and statistics

    ACS Applied Materials & Interfaces

    No full text
    Key parameters that influence the specific energy of electrochemical double-layer capacitors (EDLCs) are the double-layer capacitance and the operating potential of the cell. The operating potential of the cell is generally limited by the electrochemical window of the electrolyte solution, that is, the range of applied voltages within which the electrolyte or solvent is not reduced or oxidized. Ionic liquids are of interest as electrolytes for EDLCs because they offer relatively wide potential windows. Here, we provide a systematic study of the influence of the physical properties of ionic liquid electrolytes on the electrochemical stability and electrochemical performance (double-layer capacitance, specific energy) of EDLCs that employ a mesoporous carbon model electrode with uniform, highly interconnected mesopores (3DOm carbon). Several ionic liquids with structurally diverse anions (tetrafluoroborate, trifluoromethanesulfonate, trifluoromethanesulfonimide) and cations (imidazolium, ammonium, pyridinium, piperidinium, and pyrrolidinium) were investigated. We show that the cation size has a significant effect on the electrolyte viscosity and conductivity, as well as the capacitance of EDLCs. Imidazolium- and pyridinium-based ionic liquids provide the highest cell capacitance, and ammonium-based ionic liquids offer potential windows much larger than imidazolium and pyridinium ionic liquids. Increasing the chain length of the alkyl substituents in 1-alkyl-3-methylimidazolium trifluoromethanesulfonimide does not widen the potential window of the ionic liquid. We identified the ionic liquids that maximize the specific energies of EDLCs through the combined effects of their potential windows and the double-layer capacitance. The highest specific energies are obtained with ionic liquid electrolytes that possess moderate electrochemical stability, small ionic volumes, low viscosity, and hence high conductivity, the best performing ionic liquid tested being 1-ethyl-3-methylimidazolium bis(trifluoromethylsulfonyl)imide
    corecore