6,196 research outputs found

    A general method for the statistical evaluation of typological distributions

    Get PDF
    The distribution of linguistic structures in the world is the joint product of universal principles, inheritance from ancestor languages, language contact, social structures, and random fluctuation. This paper proposes a method for evaluating the relative significance of each factor — and in particular, of universal principles — via regression modeling: statistical evidence for universal principles is found if the odds for families to have skewed responses (e.g. all or most members have postnominal relative clauses) as opposed to having an opposite response skewing or no skewing at all, is significantly higher for some condition (e.g. VO order) than for another condition, independently of other factors

    Construction and evaluation of classifiers for forensic document analysis

    Full text link
    In this study we illustrate a statistical approach to questioned document examination. Specifically, we consider the construction of three classifiers that predict the writer of a sample document based on categorical data. To evaluate these classifiers, we use a data set with a large number of writers and a small number of writing samples per writer. Since the resulting classifiers were found to have near perfect accuracy using leave-one-out cross-validation, we propose a novel Bayesian-based cross-validation method for evaluating the classifiers.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS379 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Statistical Foundations of Actuarial Learning and its Applications

    Get PDF
    This open access book discusses the statistical modeling of insurance problems, a process which comprises data collection, data analysis and statistical model building to forecast insured events that may happen in the future. It presents the mathematical foundations behind these fundamental statistical concepts and how they can be applied in daily actuarial practice. Statistical modeling has a wide range of applications, and, depending on the application, the theoretical aspects may be weighted differently: here the main focus is on prediction rather than explanation. Starting with a presentation of state-of-the-art actuarial models, such as generalized linear models, the book then dives into modern machine learning tools such as neural networks and text recognition to improve predictive modeling with complex features. Providing practitioners with detailed guidance on how to apply machine learning methods to real-world data sets, and how to interpret the results without losing sight of the mathematical assumptions on which these methods are based, the book can serve as a modern basis for an actuarial education syllabus

    Statistical Foundations of Actuarial Learning and its Applications

    Get PDF
    This open access book discusses the statistical modeling of insurance problems, a process which comprises data collection, data analysis and statistical model building to forecast insured events that may happen in the future. It presents the mathematical foundations behind these fundamental statistical concepts and how they can be applied in daily actuarial practice. Statistical modeling has a wide range of applications, and, depending on the application, the theoretical aspects may be weighted differently: here the main focus is on prediction rather than explanation. Starting with a presentation of state-of-the-art actuarial models, such as generalized linear models, the book then dives into modern machine learning tools such as neural networks and text recognition to improve predictive modeling with complex features. Providing practitioners with detailed guidance on how to apply machine learning methods to real-world data sets, and how to interpret the results without losing sight of the mathematical assumptions on which these methods are based, the book can serve as a modern basis for an actuarial education syllabus

    Analysis of SHRP2 Data to Understand Normal and Abnormal Driving Behavior in Work Zones

    Get PDF
    This research project used the Second Strategic Highway Research Program (SHRP2) Naturalistic Driving Study(NDS) to improve highway safety by using statistical descriptions of normal driving behavior to identify abnormal driving behaviors in work zones. SHRP2 data used in these analyses included 50 safety-critical events (SCEs) from work zones and 444 baseline events selected on a matched case-control design.Principal components analysis (PCA) was used to summarize kinematic data into “normal” and “abnormal”driving. Each second of driving is described by one point in three-dimensional principal component (PC) space;an ellipse containing the bulk of baseline points is considered “normal” driving. Driving segments without-of-ellipse points have a higher probability of being an SCE. Matched case-control analysis indicates that thespecific individual and traffic flow made approximately equal contributions to predicting out-of-ellipse driving.Structural Topics Modeling (STM) was used to analyze complex categorical data obtained from annotated videos.The STM method finds “words” representing categorical data variables that occur together in many events and describes these associations as “topics.” STM then associates topics with either baselines or SCEs. The STM produced 10 topics: 3 associated with SCEs, 5 associated with baselines, and 2 that were neutral. Distractionoccurs in both baselines and SCEs.Both approaches identify the role of individual drivers in producing situations where SCEs might arise. A countermeasure could use the PC calculation to indicate impending issues or specific drivers who may havehigher crash risk, but not to employ significant interventions such as automatically braking a vehicle without-of-ellipse driving patterns. STM results suggest communication to drivers or placing compliant vehicles in thetraffic stream would be effective. Finally, driver distraction in work zones should be discouraged

    Beyond subjective and objective in statistics

    Full text link
    We argue that the words "objectivity" and "subjectivity" in statistics discourse are used in a mostly unhelpful way, and we propose to replace each of them with broader collections of attributes, with objectivity replaced by transparency, consensus, impartiality, and correspondence to observable reality, and subjectivity replaced by awareness of multiple perspectives and context dependence. The advantage of these reformulations is that the replacement terms do not oppose each other. Instead of debating over whether a given statistical method is subjective or objective (or normatively debating the relative merits of subjectivity and objectivity in statistical practice), we can recognize desirable attributes such as transparency and acknowledgment of multiple perspectives as complementary goals. We demonstrate the implications of our proposal with recent applied examples from pharmacology, election polling, and socioeconomic stratification.Comment: 35 page
    corecore