324 research outputs found

    Hypothesis testing in a generic nesting framework with general population distributions

    Get PDF
    Nested parameter spaces, either in the null or alternative hypothesis, constitute a guarantee for improving the performance of the tests, however in the existing literature on order restricted inference they have been usually skipped for being studied in detail. Divergence based divergence measures provide a flexible tool for creating meaningful test-statistics, which usually contain the likelihood ratio-test statistics as special case. The existing literature on hypothesis testing with inequality constraints using phidivergence measures, is centered in a very specific models with multinomial sampling. The contribution of this paper consists in extending and unifying widely the existing work: new families of test-statistics are presented, valid for nested parameter spaces containing either equality or inequality constraints and general distributions for either single or multiple populations are considered

    Stratified Staged Trees: Modelling, Software and Applications

    Get PDF
    The thesis is focused on Probabilistic Graphical Models (PGMs), which are a rich framework for encoding probability distributions over complex domains. In particular, joint multivariate distributions over large numbers of random variables that interact with each other can be investigated through PGMs and conditional independence statements can be succinctly represented with graphical representations. These representations sit at the intersection of statistics and computer science, relying on concepts mainly from probability theory, graph algorithms and machine learning. They are applied in a wide variety of fields, such as medical diagnosis, image understanding, speech recognition, natural language processing, and many more. Over the years theory and methodology have developed and been extended in a multitude of directions. In particular, in this thesis different aspects of new classes of PGMs called Staged Trees and Chain Event Graphs (CEGs) are studied. In some sense, Staged Trees are a generalization of Bayesian Networks (BNs). Indeed, BNs provide a transparent graphical tool to define a complex process in terms of conditional independent structures. Despite their strengths in allowing for the reduction in the dimensionality of joint probability distributions of the statistical model and in providing a transparent framework for causal inference, BNs are not optimal GMs in all situations. The biggest problems with their usage mainly occur when the event space is not a simple product of the sample spaces of the random variables of interest, and when conditional independence statements are true only under certain values of variables. This happens when there are context-specific conditional independence structures. Some extensions to the BN framework have been proposed to handle these issues: context-specific BNs, Bayesian Multinets, or Similarity Networks citep{geiger1996knowledge}. These adopt a hypothesis variable to encode the context-specific statements over a particular set of random variables. For each value taken by the hypothesis variable the graphical modeller has to construct a particular BN model called local network. The collection of these local networks constitute a Bayesian Multinet, Probabilistic Decision Graphs, among others. It has been showed that Chain Event Graph (CEG) models encompass all discrete BN models and its discrete variants described above as a special subclass and they are also richer than Probabilistic Decision Graphs whose semantics is actually somewhat distinct. Unlike most of its competitors, CEGs can capture all (also context-specific) conditional independences in a unique graph, obtained by a coalescence over the vertices of an appropriately constructed probability tree, called Staged Tree. CEGs have been developed for categorical variables and have been used for cohort studies, causal analysis and case-control studies. The user\u2019s toolbox to efficiently and effectively perform uncertainty reasoning with CEGs further includes methods for inference and probability propagation, the exploration of equivalence classes and robustness studies. The main contributions of this thesis to the literature on Staged Trees are related to Stratified Staged Trees with a keen eye of application. Few observations are made on non-Stratified Staged Trees in the last part of the thesis. A core output of the thesis is an R software package which efficiently implements a host of functions for learning and estimating Staged Trees from data, relying on likelihood principles. Also structural learning algorithms based on distance or divergence between pair of categorical probability distributions and based on the clusterization of probability distributions in a fixed number of stages for each stratum of the tree are developed. Also a new class of Directed Acyclic Graph has been introduced, named Asymmetric-labeled DAG (ALDAG), which gives a BN representation of a given Staged Tree. The ALDAG is a minimal DAG such that the statistical model embedded in the Staged Tree is contained in the one associated to the ALDAG. This is possible thanks to the use of colored edges, so that each color indicates a different type of conditional dependence: total, context-specific, partial or local. Staged Trees are also adopted in this thesis as a statistical tool for classification purpose. Staged Tree Classifiers are introduced, which exhibit comparable predictive results based on accuracy with respect to algorithms from state of the art of machine learning such as neural networks and random forests. At last, algorithms to obtain an ordering of variables for the construction of the Staged Tree are designed

    Genome evolution in Prochlorococcus and marine Synechococcus

    Get PDF

    The influence of estimator attitude on project cost reliability

    Get PDF
    The reliability of project estimates is dependent on a number of factors that can be classed as exogenous or endogenous to the estimator. The exogenous factors comprise information, environment, technology, methods and processes, which are external to the estimator. The endogenous factors reflect personal characteristics associated with the estimator and consist of aspects such as judgement, preferences and personality. Construction‟s effort to improve the estimating function has addressed both the practice of, and process of delivering the estimate. Much of the effort, however, has been addressed at aspects of estimating that can be considered classed under exogenous factors. This includes the use of technology to improve both the accuracy of computation and the speed for generating the estimate. Notwithstanding progressive improvement achieved in estimating from addressing such exogenous factors, most project-oriented industries still suffer from unreliable estimates. Although the problem of unreliable estimates is a worldwide phenomenon, it reflects more starkly in many developing economies, where its effect is much more striking. Understanding the root causes of the persistence of unreliable estimates would therefore, call for a focus on factors other than the exogenous ones that most improvement and development efforts have focus on. The study, which formed the basis of this thesis adopts the position that any improvements in reliability, beyond what the exogenous-based developments have achieved so far, lies in the contribution that estimators can make by addressing their endogenous factors. For that position to be valid, the study showed that the personality characteristics of various estimators produce different levels of reliability. Three endogenous factors, experience, qualification, and personality archetype (or trait) were employed to explore the relationships with estimating reliability. A quantitative research approach was adopted for the investigation, as the nature of evidence required was primarily objective, to substantiate the argument that different levels of particular endogenous factors produce different reliabilities in estimating. Data for the study was obtained from Ghana. Two categories of sample data were collected through stratification of the population, followed by systematic sampling methods. The two samples were a control group, comprising estimators with more than or equal to ten years experience; and an observed (or study group), made up of estimators with less than ten years experience. An instrument based on a self-reporting protocol was developed and utilized in the elicitation of data from both groups. (Continues...)

    Data exploration with learning metrics

    Get PDF
    A crucial problem in exploratory analysis of data is that it is difficult for computational methods to focus on interesting aspects of data. Traditional methods of unsupervised learning cannot differentiate between interesting and noninteresting variation, and hence may model, visualize, or cluster parts of data that are not interesting to the analyst. This wastes the computational power of the methods and may mislead the analyst. In this thesis, a principle called "learning metrics" is used to develop visualization and clustering methods that automatically focus on the interesting aspects, based on auxiliary labels supplied with the data samples. The principle yields non-Euclidean (Riemannian) metrics that are data-driven, widely applicable, versatile, invariant to many transformations, and in part invariant to noise. Learning metric methods are introduced for five tasks: nonlinear visualization by Self-Organizing Maps and Multidimensional Scaling, linear projection, and clustering of discrete data and multinomial distributions. The resulting methods either explicitly estimate distances in the Riemannian metric, or optimize a tailored cost function which is implicitly related to such a metric. The methods have rigorous theoretical relationships to information geometry and probabilistic modeling, and are empirically shown to yield good practical results in exploratory and information retrieval tasks.reviewe

    Design Development Test and Evaluation (DDT and E) Considerations for Safe and Reliable Human Rated Spacecraft Systems

    Get PDF
    A team directed by the NASA Engineering and Safety Center (NESC) collected methodologies for how best to develop safe and reliable human rated systems and how to identify the drivers that provide the basis for assessing safety and reliability. The team also identified techniques, methodologies, and best practices to assure that NASA can develop safe and reliable human rated systems. The results are drawn from a wide variety of resources, from experts involved with the space program since its inception to the best-practices espoused in contemporary engineering doctrine. This report focuses on safety and reliability considerations and does not duplicate or update any existing references. Neither does it intend to replace existing standards and policy
    corecore