Search CORE

6,196 research outputs found

A general method for the statistical evaluation of typological distributions

Author: Bickel Balthasar
Publication venue
Publication date: 10/08/2010
Field of study

The distribution of linguistic structures in the world is the joint product of universal principles, inheritance from ancestor languages, language contact, social structures, and random fluctuation. This paper proposes a method for evaluating the relative significance of each factor — and in particular, of universal principles — via regression modeling: statistical evidence for universal principles is found if the odds for families to have skewed responses (e.g. all or most members have postnominal relative clauses) as opposed to having an opposite response skewing or no skewing at all, is significantly higher for some condition (e.g. VO order) than for another condition, independently of other factors

Hochschulschriftenserver - Universität Frankfurt am Main

Construction and evaluation of classifiers for forensic document analysis

Author: Davis Linda J.
Gantz Donald T.
Lamas Andrea C.
Miller John J.
Saunders Christopher P.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 28/06/2011
Field of study

In this study we illustrate a statistical approach to questioned document examination. Specifically, we consider the construction of three classifiers that predict the writer of a sample document based on categorical data. To evaluate these classifiers, we use a data set with a large number of writers and a small number of writing samples per writer. Since the resulting classifiers were found to have near perfect accuracy using leave-one-out cross-validation, we propose a novel Bayesian-based cross-validation method for evaluating the classifiers.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS379 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Exploratory Analysis of Multivariate Data (Unsupervised Image Segmentation and Data Driven Linear and Nonlinear Decomposition)

Author: Hilger Klaus Baggesen
Publication venue
Publication date: 01/03/2002
Field of study

Online Research Database In Technology

Statistical Foundations of Actuarial Learning and its Applications

Author: Merz Michael
Wüthrich Mario V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/12/2022
Field of study

This open access book discusses the statistical modeling of insurance problems, a process which comprises data collection, data analysis and statistical model building to forecast insured events that may happen in the future. It presents the mathematical foundations behind these fundamental statistical concepts and how they can be applied in daily actuarial practice. Statistical modeling has a wide range of applications, and, depending on the application, the theoretical aspects may be weighted differently: here the main focus is on prediction rather than explanation. Starting with a presentation of state-of-the-art actuarial models, such as generalized linear models, the book then dives into modern machine learning tools such as neural networks and text recognition to improve predictive modeling with complex features. Providing practitioners with detailed guidance on how to apply machine learning methods to real-world data sets, and how to interpret the results without losing sight of the mathematical assumptions on which these methods are based, the book can serve as a modern basis for an actuarial education syllabus

Directory of Open Access Books (DOAB)

Statistical Foundations of Actuarial Learning and its Applications

Author: Merz Michael
Wüthrich Mario V.
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

Recommended from our members

Geostatistical data integration in complex reservoirs

Author: Elahi Naraghi Morteza
Publication venue
Publication date: 03/02/2015
Field of study

textOne of the most challenging issues in reservoir modeling is to integrate information coming from different sources at disparate scales and precision. The primary data are borehole measurements, but in most cases, these are too sparse to construct accurate reservoir models. Therefore, in most cases, the information from borehole measurements has to be supplemented with other secondary data. The secondary data for reservoir modeling could be static data such as seismic data or dynamic data such as production history, well test data or time-lapse seismic data. Several algorithms for integrating different types of data have been developed. A novel method for data integration based on the permanence of ratio hypothesis was proposed by Journel in 2002. The premise of the permanence of ratio hypothesis is to assess the information from each data source separately and then merge the information accounting for the redundancy between the information sources. The redundancy between the information from different sources is accounted for using parameters (tau or nu parameters, Krishnan, 2004). The primary goal of this thesis is to derive a practical expression for the tau parameters and demonstrate the procedure for calibrating these parameters using the available data. This thesis presents two new algorithms for data integration in reservoir modeling. The algorithms proposed in this thesis overcome some of the limitations of the current methods for data integration. We present an extension to the direct sampling based multiple-point statistics method. We present a methodology for integrating secondary soft data in that framwork. The algorithm is based on direct pattern search through an ensemble of realizations. We show that the proposed methodology is sutiable for modeling complex channelized reservoirs and reduces the uncertainty associated with production performance due to integration of secondary data. We subsequently present the permanence of ratio hypothesis for data integration in great detail. We present analytical equations for calculating the redundancy factor for discrete or continuous variable modeling. Then, we show how this factor can be infered using available data for different scenarios. We implement the method to model a carbonate reservoir in the Gulf of Mexico. We show that the method has a better performance than when primary hard and secondary soft data are used within the traditional geostatistical framework.Petroleum and Geosystems Engineerin

Texas ScholarWorks

Analysis of SHRP2 Data to Understand Normal and Abnormal Driving Behavior in Work Zones

Author: Boyraz Baykas Pinar
Flannagan Carol A.
Kovaceva Jordanka
Leslie Andrew
Selpi Selpi
Thomson Robert
Publication venue
Publication date: 01/01/2019
Field of study

This research project used the Second Strategic Highway Research Program (SHRP2) Naturalistic Driving Study(NDS) to improve highway safety by using statistical descriptions of normal driving behavior to identify abnormal driving behaviors in work zones. SHRP2 data used in these analyses included 50 safety-critical events (SCEs) from work zones and 444 baseline events selected on a matched case-control design.Principal components analysis (PCA) was used to summarize kinematic data into “normal” and “abnormal”driving. Each second of driving is described by one point in three-dimensional principal component (PC) space;an ellipse containing the bulk of baseline points is considered “normal” driving. Driving segments without-of-ellipse points have a higher probability of being an SCE. Matched case-control analysis indicates that thespecific individual and traffic flow made approximately equal contributions to predicting out-of-ellipse driving.Structural Topics Modeling (STM) was used to analyze complex categorical data obtained from annotated videos.The STM method finds “words” representing categorical data variables that occur together in many events and describes these associations as “topics.” STM then associates topics with either baselines or SCEs. The STM produced 10 topics: 3 associated with SCEs, 5 associated with baselines, and 2 that were neutral. Distractionoccurs in both baselines and SCEs.Both approaches identify the role of individual drivers in producing situations where SCEs might arise. A countermeasure could use the PC calculation to indicate impending issues or specific drivers who may havehigher crash risk, but not to employ significant interventions such as automatically braking a vehicle without-of-ellipse driving patterns. STM results suggest communication to drivers or placing compliant vehicles in thetraffic stream would be effective. Finally, driver distraction in work zones should be discouraged

Chalmers Research

Beyond subjective and objective in statistics

Author: Gelman Andrew
Hennig Christian
Publication venue
Publication date: 21/08/2015
Field of study

We argue that the words "objectivity" and "subjectivity" in statistics discourse are used in a mostly unhelpful way, and we propose to replace each of them with broader collections of attributes, with objectivity replaced by transparency, consensus, impartiality, and correspondence to observable reality, and subjectivity replaced by awareness of multiple perspectives and context dependence. The advantage of these reformulations is that the replacement terms do not oppose each other. Instead of debating over whether a given statistical method is subjective or objective (or normatively debating the relative merits of subjectivity and objectivity in statistical practice), we can recognize desirable attributes such as transparency and acknowledgment of multiple perspectives as complementary goals. We demonstrate the implications of our proposal with recent applied examples from pharmacology, election polling, and socioeconomic stratification.Comment: 35 page

arXiv.org e-Print Archive

CiteSeerX