3,132 research outputs found

    Estimating of bootstrap confidence intervals for freight transport matrices

    Get PDF
    Freight transport studies require, as a preliminary step, a survey to be conducted on a sample of the universe of agents, vehicles and/or companies of the transportation system. The statistical reliability of the data determines the goodness of the outcomes and conclusions that can be inferred from the analyses and models generated. The methodology contained herein, based on bootstrapping techniques, allows us to generate the confidence intervals of origin-destination pairs defined by each cell of the matrix derived from a freight transport survey. To address this study a data set from a statistically reliable freight transport study conducted in Spain at the level of multi-province inter-regions has been used.Public Road Agency of the Andalusian Regional Government (AOP-JA, Spain Project G-GI3000/IDII)EU FEDE

    Estimating of bootstrap confidence intervals for freight transport matrices

    Get PDF
    Freight transport studies require, as a preliminary step, a survey to be conducted on a sample of the universe of agents, vehicles and/or companies of the transportation system. The statistical reliability of the data determines the goodness of the outcomes and conclusions that can be inferred from the analyses and models generated. The methodology contained herein, based on bootstrapping techniques, allows us to generate the confidence intervals of origin-destination pairs defined by each cell of the matrix derived from a freight transport survey. To address this study a data set from a statistically reliable freight transport study conducted in Spain at the level of multi-province inter-regions has been used.Public Road Agency of the Andalusian Regional Government (AOP-JA, Spain Project G-GI3000/IDII)EU FEDE

    Confidence Intervals for Unobserved Events

    Full text link
    Consider a finite sample from an unknown distribution over a countable alphabet. Unobserved events are alphabet symbols which do not appear in the sample. Estimating the probabilities of unobserved events is a basic problem in statistics and related fields, which was extensively studied in the context of point estimation. In this work we introduce a novel interval estimation scheme for unobserved events. Our proposed framework applies selective inference, as we construct confidence intervals (CIs) for the desired set of parameters. Interestingly, we show that obtained CIs are dimension-free, as they do not grow with the alphabet size. Further, we show that these CIs are (almost) tight, in the sense that they cannot be further improved without violating the prescribed coverage rate. We demonstrate the performance of our proposed scheme in synthetic and real-world experiments, showing a significant improvement over the alternatives. Finally, we apply our proposed scheme to large alphabet modeling. We introduce a novel simultaneous CI scheme for large alphabet distributions which outperforms currently known methods while maintaining the prescribed coverage rate

    How many clones need to be sequenced from a single forensic or ancient DNA sample in order to determine a reliable consensus sequence?

    Get PDF
    Forensic and ancient DNA (aDNA) extracts are mixtures of endogenous aDNA, existing in more or less damaged state, and contaminant DNA. To obtain the true aDNA sequence, it is not sufficient to generate a single direct sequence of the mixture, even where the authentic aDNA is the most abundant (e.g. 25% or more) in the component mixture. Only bacterial cloning can elucidate the components of this mixture. We calculate the number of clones that need to be sampled (for various mixture ratios) in order to be confident (at various levels of confidence) to have identified the major component. We demonstrate that to be >95% confident of identifying the most abundant sequence present at 70% in the ancient sample, 20 clones must be sampled. We make recommendations and offer a free-access web-based program, which constructs the most reliable consensus sequence from the user's input clone sequences and analyses the confidence limits for each nucleotide position and for the whole consensus sequence. Accepted authentication methods must be employed in order to assess the authenticity and endogeneity of the resulting consensus sequences (e.g. quantification and replication by another laboratory, blind testing, amelogenin sex versus morphological sex, the effective use of controls, etc.) and determine whether they are indeed aDNA

    Statistical Inference on Optimal Points to Evaluate Multi-State Classification Systems

    Get PDF
    In decision making, an optimal point represents the settings for which a classification system should be operated to achieve maximum performance. Clearly, these optimal points are of great importance in classification theory. Not only is the selection of the optimal point of interest, but quantifying the uncertainty in the optimal point and its performance is also important. The Youden index is a metric currently employed for selection and performance quantification of optimal points for classification system families. The Youden index quantifies the correct classification rates of a classification system, and its confidence interval quantifies the uncertainty in this measurement. This metric currently focuses on two or three classes, and only allows for the utility of correct classifications and the cost of total misclassifications to be considered. An alternative to this metric for three or more classes is a cost function which considers the sum of incorrect classification rates. This new metric is preferable as it can include class prevalences and costs associated with every classification. In multi-class settings this informs better decisions and inferences on optimal points. The work in this dissertation develops theory and methods for confidence intervals on a metric based on misclassfication rates, Bayes Cost, and where possible, the thresholds found for an optimal point using Bayes Cost. Hypothesis tests for Bayes Cost are also developed to test a classification systems performance or compare systems with an emphasis on classification systems involving three or more classes. Performance of the newly proposed methods is demonstrated with simulation

    Multilevel modelling of refusal and noncontact nonresponse in household surveys: evidence from six UK government surveys

    No full text
    This paper analyses household unit nonresponse and interviewer effects in six major UK government surveys using a multilevel multinomial modelling approach. The models are guided by current conceptual frameworks and theories of survey participation. One key feature of the analysis is the investigation of survey dependent and independent effects of household and interviewer characteristics, providing an empirical exploration of the leverage-salience theory. The analysis is based on the 2001 UK Census Link Study, a unique data source containing an unusually rich set of auxiliary variables, linking the response outcome of six surveys to census data, interviewer observation data and interviewer information, available for respondents and nonrespondents
    • …
    corecore