789 research outputs found

    Analysis of a Voting Method for Ranking Network Centrality Measures on a Node-aligned Multiplex Network

    Get PDF
    Identifying relevant actors using information gleaned from multiple networks is a key goal within the context of human aspects of military operations. The application of a voting theory methodology for determining nodes of critical importance—in ranked order of importance—for a node-aligned multiplex network is demonstrated. Both statistical and qualitative analyses on the differences of ranking outcomes under this methodology is provided. As a corollary, a multilayer network reduction algorithm is investigated within the context of the proposed ranking methodology. The application of the methodology detailed in this thesis will allow meaningful rankings of relevant actors to be produced on a multiplex network

    Weighting methods for variance heterogeneity in phenotypic and genomic data analysis for crop breeding

    Get PDF
    In plant breeding programmes MET form the backbone for phenotypic selection, GS and GWAS. Efficient analysis of MET is fundamental to get accurate results from phenotypic selection, GS and GWAS. On the other hand inefficient analysis of MET data may have consequences such as biased ranking of genotype means in phenotypic data analysis, small accuracy of GS and wrong identification of QTL in GWAS analysis. A combined analysis of MET is performed using either single-stage or stage-wise (two-stage) approaches based on the linear mixed model framework. While single-stage analysis is a fully efficient approach, MET data is suitably analyzed using stage-wise methods. MET data often show within-trial and between-trial variance heterogeneities, which is in contradiction with the homogeneity of variance assumption of linear models, and these heterogeneities require corrections. In addition it is well documented that spatial correlations are inherent to most field trials. Appropriate remedial techniques for variance heterogeneities and proper accounting of spatial correlation are useful to improve accuracy and efficiency of MET analysis. Chapter 2 studies methods for simultaneous handling of within-trial variance heterogeneity and within-trial spatial correlation. This study is conducted based on three maize trials from Ethiopia. To stabilize variance Box-Cox transformation was considered. The result shows that, while the Box-Cox transformation was suitable for stabilizing the variance, it is difficult to report results on the original scale. As alternative variance models, i.e. power-of-the-mean (POM) and exponential models, were used to fix the variance heterogeneity problem. Unlike the Box-Cox method, the variance models considered in this study were successful to deal simultaneously with both spatial correlation and heterogeneity of variance. For analysis of MET data, two-stage analysis is often favored in practice over single-stage analysis because of its suitability in terms of computation time, and its ability to easily account for any specifics of each trial (variance heterogeneity, spatial correlation, etc). Stage-wise analyses are approximate in that they cannot fully reproduce a single-stage analysis because the variancecovariance matrix of adjusted means from the first-stage analysis is sometimes ignored or sometimes approximated and the approximation may not be efficient. Discrepancy of results between single-stage and two-stage analysis increases when the variance between trials is heterogeneous. In stage-wise analysis one of the major challenges is how to account for heterogeneous variance between trials at the second stage. To account for heterogeneous variance between trials, a weighted mixed model approach is used for the second-stage analysis. The weights are derived from the variances and covariances of adjusted means from the first-stage analysis. In Chapter 3 we compared single-stage analysis and two-stage analysis. A new fully efficient and a diagonal weighting matrix are used for weighting in the second stage. The methods are explored using two different types of maize datasets. The result indicates that single-stage analysis and two-stage analysis give nearly identical results provided that the full information on all effect estimates and their associated estimated variances and covariances is carried forward from the first to the second stage. GWAS and GS analysis can be conducted using a single-stage or a stage-wise approach. The computational demand for GWAS and GS increases compared to purely phenotypic analysis because of the addition of marker data. Usually researchers compute genotype means from phenotypic MET data in stage-wise analysis (with or without weighting) and then forward these means to GWAS or GS analysis, often without any weighting. In Chapter 4 weighted stage-wise analysis versus unweighted stage-wise analysis are compared for GWAS and GS using phenotypic and genotypic maize data. Fully-efficient and a diagonal weighting are used. Results show that weighting is preferred over unweighted analysis for both GS and GWAS. In conclusion, stage-wise analysis is a suitable approach for practical analysis of MET, GS and GWAS analysis. Single-stage and two-stage analysis of MET yield very similar results. Stage-wise analysis can be nearly as efficient as single-stage analysis when using optimal weighting, i.e., fully-efficient weighting. Spatial variation and within-trial variance heterogeneity are common in MET data. This study illustrated that both can be resolved simultaneously using a weighting approach for the variance heterogeneity and spatial modeling for the spatial variation. Finally beside application of weighting in the analysis of phenotypic MET data, it is recommended to use weighting in the actual GS and GWAS analysis stage.In Pflanzenzüchtungsprogrammen bilden Versuchsserien die Grundlage für die phänotypische Selektion, genomische Selektion (GS) und genomweite Assoziationsstudien (GWAS). Eine effiziente Analyse der Versuchsserien ist grundlegend, um genaue Ergebnisse der phänotypischen Auswahl von GS und GWAS zu erhalten. Andererseits kann eine ineffiziente Analyse von Versuchsserien-Daten zu einer verzerrten Bewertung von Genotyp-Mitteln bei der Analyse phänotypischer Daten, einer geringen Genauigkeit der GS und einer falschen Identifizierung von QTL in der GWAS-Analyse führen. Eine kombinierte Analyse der Versuchsserien wird auf der Grundlage von linearen gemischten Modellen entweder einstufig oder stufenweise (zweistufig) durchgeführt. Während die einstufige Analyse ein vollständig effizienter Ansatz ist, werden die Versuchsserien-Daten in geeigneter Weise mit stufenweisen Methoden analysiert. Versuchsserien-Daten zeigen häufig Varianzheterogenitäten innerhalb von und zwischen Versuchen, die der Annahme der Varianzhomogenität für linearer Modelle widersprechen und Korrekturen erfordern. Darüber hinaus ist gut dokumentiert, dass räumliche Korrelationen in den meisten Feldversuchen vorhanden sind. Geeignete Abhilfemethoden für Varianzheterogenitäten und eine korrekte Berücksichtigung der räumlichen Korrelation sind hilfreich, um die Genauigkeit und Effizienz der versuchsserien-Analyse zu verbessern. In Kapitel 2 werden Methoden zum gleichzeitigen Umgang mit Varianzheterogenitat zwischen und räumlicher Korrelation innerhalb der Versuche untersucht. Diese Studie basiert auf drei Maisversuchen aus Äthiopien. Um die Varianz zu stabilisieren, wurde die Box-Cox-Transformation in Betracht gezogen. Das Ergebnis zeigt, dass, obwohl die Box-Cox-Transformation zur Stabilisierung der Varianz geeignet war, es schwierig ist, Ergebnisse auf der ursprünglichen Skala darzustellen. Als alternative Varianzmodelle wurden Power-of-the-mean (POM) und Exponentialmodelle verwendet, um das Varianzheterogenitätsproblem zu beheben. Im Gegensatz zur Box-Cox-Methode gelang es den in dieser Studie betrachteten Varianzmodellen, sowohl räumliche Korrelation als auch Heterogenität der Varianz gleichzeitig zu berücksichtigen. Bei der Analyse von MET-Daten wird die zweistufige Analyse in der Praxis häufig gegenüber der einstufigen Analyse bevorzugt, da sie die Berechnungszeit kürzer ist und die Besonderheiten der einzelnen Versuche (Varianzheterogenität, räumliche Korrelation usw.) leicht berücksichtigt werden können. Stufenweise Analysen sind insofern approximierend, als sie eine einstufige Analyse nicht vollständig reproduzieren können, da die Varianz-Kovarianz-Matrix der angepassten Mittelwerte aus der ersten Analyse-Phase manchmal ignoriert oder manchmal approximiert wird und die Approximation möglicherweise nicht effizient ist. Die Diskrepanz der Ergebnisse zwischen einstufiger und zweistufiger Analyse nimmt zu, wenn die Varianzen zwischen den Studien heterogen sind. Bei der stufenweisen Analyse besteht eine der größten Herausforderungen darin, die heterogene Varianz zwischen den Versuchen auf der zweiten Stufe zu berücksichtigen. Um die heterogene Varianz zwischen den Studien zu berücksichtigen, wird für die Analyse der zweiten Stufe ein gewichteter gemischter Modellansatz verwendet. Die Gewichtungen werden aus den Varianzen und den Kovarianzen der angepassten Mittel aus der Analyse der ersten Stufe abgeleitet. In Kapitel 3 haben wir die einstufige Analyse und die zweistufige Analyse verglichen. In der zweiten Stufe wird eine neue voll effiziente und eine diagonale Gewichtungsmatrix für die Gewichtung verwendet. Die Studien werden anhand zweier verschiedener Arten von Mais-Datasätze untersucht. Das Ergebnisse zeigen, dass die einstufige Analyse und die zweistufige Analyse nahezu identische Ergebnisse liefern, vorausgesetzt, die vollständigen Informationen zu allen Effektschätzungen und den damit verbundenen geschätzten Varianzen und Kovarianzen werden von der ersten zur zweiten Stufe übertragen. Die GWAS- und GS-Analyse kann nach einem einstufigen oder einem stufenweisen Ansatz durchgeführt werden. Der rechnerische Bedarf an GWAS und GS steigt im Vergleich zur rein phänotypischen Analyse aufgrund der Hinzufügung von Markerdaten. In der Regel berechnen Forscher Genotyp-Mittel aus phänotypischen Versuchsserien-Daten in stufenweisen Analysen (mit oder ohne Gewichtung) und leiten diese dann in die GWAS- oder GS-Analyse weiter, oft ohne Gewichtung. In Kapitel 4 wird die gewichtete stufenweise Analyse gegen die ungewichtete stufenweise Analyse für GWAS und GS anhand von phänotypischen und genotypischen Maisdaten verglichen. Es werden volleffiziente und diagonale Gewichtungen verwendet. Die Ergebnisse zeigen, dass die gewichtete gegenüber der nicht gewichteten Analyse sowohl für GS als auch für GWAS besser ist. Zusammenfassend ist die stufenweise Analyse ein geeigneter Ansatz für die praktische Versuchsserien-, GS- und GWAS-Analyse. Einstufige und zweistufige Versuchsserien-Analysen führen zu sehr ähnlichen Ergebnissen. Eine stufenweise Analyse kann wie eine einstufige Analyse effizient sein, indem eine optimale Gewichtung verwendet wird, d. h. eine vollständig effiziente Gewichtung. In Versuchsserien-Daten sind räumliche Variation und Varianzheterogenität innerhalb der Versuche üblich. Diese Studie zeigte, dass beide gleichzeitig unter Verwendung eines Gewichtungsansatzes die Varianzheterogenität und räumliche Korrelation berücksichtigen können. Neben der Anwendung der Gewichtung bei der Analyse phänotypischer MET-Daten wird empfohlen, die Gewichtung in der eigentlichen GS- und GWAS-Analysestufe zu verwenden

    Recent Developments in the Econometrics of Program Evaluation

    Get PDF
    Many empirical questions in economics and other social sciences depend on causal effects of programs or policies. In the last two decades much research has been done on the econometric and statistical analysis of the effects of such programs or treatments. This recent theoretical literature has built on, and combined features of, earlier work in both the statistics and econometrics literatures. It has by now reached a level of maturity that makes it an important tool in many areas of empirical research in economics, including labor economics, public finance, development economics, industrial organization and other areas of empirical micro-economics. In this review we discuss some of the recent developments. We focus primarily on practical issues for empirical researchers, as well as provide a historical overview of the area and give references to more technical research.program evaluation, causality, unconfoundedness, Rubin Causal Model, potential outcomes, instrumental variables

    Recent developments in the econometrics of program evaluation

    Get PDF
    Many empirical questions in economics and other social sciences depend on causal effects of programs or policies. In the last two decades much research has been done on the econometric and statistical analysis of the effects of such programs or treatments. This recent theoretical literature has built on, and combined features of, earlier work in both the statistics and econometrics literatures. It has by now reached a level of maturity that makes it an important tool in many areas of empirical research in economics, including labor economics, public finance, development economics, industrial organization and other areas of empirical micro-economics. In this review we discuss some of the recent developments. We focus primarily on practical issues for empirical researchers, as well as provide a historical overview of the area and give references to more technical research.

    New strategies to detect and understand genotype-by-environment interactions and QTL-by-environment interactions

    Get PDF
    Dissertação para obtenção do Grau de Doutor em Estatística e Gestão do Risco, especialidade em EstatísticaGenotype-by-environment interaction (GEI) is frequent in multi-environment trials, and represents differential responses of genotypes across environments. With the development of molecular markers and mapping techniques, researchers can go one step further and analyse the whole genome to detect specific locations of genes which influence a quantitative trait such as yield. These locations are called quantitative trait locus (QTL), and when these QTLs have different expression across environments we talk about QTLby-environment interactions (QEI), which is the base of GEI. Good understandings of these interactions enable researchers to select better genotypes across different environmental conditions and, consequently, to improve crops in developed and developing countries. In this thesis I intend to present new strategies to improve detection and better understanding of QTLs, especially those exhibiting QEI in the context of multi-environment trials, by using and providing open source software. The first part of this thesis presents a comparison between two of the most used methods to analyse and to structure GEI: the joint regression analysis (JRA) and the additive main effects and multiplicative interaction (AMMI) model. This comparison is made in terms of “robustness” with different incidence rates of missing values, and in terms of dominant/winner genotypes. In the following chapters two- and threestages approaches are presented in which the AMMI model is used to gain accuracy in the phenotypic data, and their scores used to order the environments to find ecological or biological patterns. The first approach (two stages) is appropriated when the error variance is constant across environments, whereas the second (three stages) is more general and accounts for differences in the error variances by using the proposed weighted AMMI model (WAMMI). The final part of the thesis illustrates a strategy to simulate and to model GEI and QEI in complex traits, with the example of yield, based on a number of physiological parameters purely genotype dependent. This is done by using an eco-physiological genotype-to-phenotype model with seven parameters defined with a simple QTL basis.Fundação para a Ciência e Tecnologia - SFRH/BD/35994/2007; project N N310 447838 supported by Ministry of Science and Higher Education, Poland

    STATISTICAL METHODS FOR ENVIRONMENTAL EXPOSURE DATA SUBJECT TO DETECTION LIMITS

    Get PDF
    In this dissertation, we develop unified and efficient nonparametric statistical methods for estimating and comparing environmental exposure distributions in presence of detection limits. In the first part, we propose a kernel-smoothed nonparametric estimator for the exposure distribution without imposing any independence assumption between the exposure level and detection limit. We show that the proposed estimator is consistent and asymptotically normal. Simulation studies demonstrate that the proposed estimator performs well in practical situations. A colon cancer study is provided for illustration. In the second part, we develop a class of test statistics to compare exposure distributions between two groups by using the integrated weighted difference in the kernel-smoothed estimator proposed in the first part. We study the conditions on the weight function such that the test statistics are stable, i.e. the asymptotic variances are finite. Simulation studies demonstrate that the proposed tests preserve type I errors regardless whether the distributions of the detection limit in the two groups differ or not and are more efficient than current methods in certain situations. A colon cancer study is provided for illustration. In the third part, we extend the estimation and testing methods developed in the part one and two to survey data by incorporating sampling weights. The results of several simulation studies are reported to demonstrate the performance of the proposed methods. The Jackknife method is utilized for the variance estimation to account for complex sample designs
    corecore