63 research outputs found

    Exact Approaches for Bias Detection and Avoidance with Small, Sparse, or Correlated Categorical Data

    Get PDF
    Every day, traditional statistical methodology are used world wide to study a variety of topics and provides insight regarding countless subjects. Each technique is based on a distinct set of assumptions to ensure valid results. Additionally, many statistical approaches rely on large sample behavior and may collapse or degenerate in the presence of small, spare, or correlated data. This dissertation details several advancements to detect these conditions, avoid their consequences, and analyze data in a different way to yield trustworthy results. One of the most commonly used modeling techniques for outcomes with only two possible categorical values (eg. live/die, pass/fail, better/worse, ect.) is logistic regression. While some potential complications with this approach are widely known, many investigators are unaware that their particular data does not meet the foundational assumptions, since they are not easy to verify. We have developed a routine for determining if a researcher should be concerned about potential bias in logistic regression results, so they can take steps to mitigate the bias or use a different procedure altogether to model the data. Correlated data may arise from common situations such as multi-site medical studies, research on family units, or investigations on student achievement within classrooms. In these circumstance the associations between cluster members must be included in any statistical analysis testing the hypothesis of a connection be-tween two variables in order for results to be valid. Previously investigators had to choose between using a method intended for small or sparse data while assuming independence between observations or a method that allowed for correlation between observations, while requiring large samples to be reliable. We present a new method that allows for small, clustered samples to be assessed for a relationship between a two-level predictor (eg. treatment/control) and a categorical outcome (eg. low/medium/high)

    A D-vine copula mixed model for joint meta-analysis and comparison of diagnostic tests

    Get PDF
    For a particular disease, there may be two diagnostic tests developed, where each of the tests is subject to several studies. A quadrivariate generalised linear mixed model (GLMM) has been recently proposed to joint meta-analyse and compare two diagnostic tests. We propose a D-vine copula mixed model for joint meta-analysis and comparison of two diagnostic tests. Our general model includes the quadrivariate GLMM as a special case and can also operate on the original scale of sensitivities and specificities. The method allows the direct calculation of sensitivity and specificity for each test, as well as the parameters of the summary receiver operator characteristic (SROC) curve, along with a comparison between the SROCs of each test. Our methodology is demonstrated with an extensive simulation study and illustrated by meta-analysing two examples where two tests for the diagnosis of a particular disease are compared. Our study suggests that there can be an improvement on GLMM in fit to data since our model can also provide tail dependencies and asymmetries

    Introduction to methods for analysis of combined individual and aggregate social science data

    Get PDF
    The lecture notes from this workshop provide an introduction to a new class of multilevel models – termed hierarchical related regressions (HRR) – for estimating individual-level associations using a combination of aggregate (group level) and individual-level data. HRR differs from other methods by enabling analysts to model individual and aggregate data simultaneously, while including information on the dependent variable at the aggregate level (e.g. constituency election results), and data from aggregation units not available at the individual level (e.g. census data from all constituencies or output areas in the country). The workshop will also discuss HRR as a method of improving ecological inference (analyses that aim to make inference on the relationship between individual-level quantities using aggregate data). The HRR models combine features of standard ecological regression models for aggregate data and multilevel models for clustered individual-level data, and have been shown to reduce bias and improve precision in many situations

    Analysis and Diagnostics of Categorical Variables with Multiple Outcomes

    No full text
    Surveys often contain qualitative variables for which respondents may select any number of the outcome categories. For instance, for the question "What type of contraceptive have you used?" with possible responses (oral, condom, lubricated condom, spermicide, and diaphragm), respondents would be instructed to select as many of the J = 5 outcomes as apply. This situation is known as multiple responses and outcomes are referred to as items. This thesis discusses several approaches to analysing such data. For stratified multiple response data, we consider three ways of defining the common odds ratio, a summarising measure for the conditional association between a row variable and the multiple response variable, given a stratification variable. For each stratum, we define the odds ratio in terms of: 1 item and 2 rows, 2 items and 2 rows, and 2 items and 1 row. Then we consider two estimation approaches for the common odds ratio and its (co)variance estimators for these types of odds ratios. The model-based approach treats the J items as a Jdimensional binary response and then uses logit models directly for the marginal distribution of each item by applying the generalised estimating equation (GEE) (Liang and Zeger 1986) method. The non-model-based approach uses Mantel-Haenszel (MH) type estimators. The model-based (or marginal model) approach is still applicable for more than two explanatory variables. Preisser and Qaqish (1996) proposed regression diagnostics for GEE. Another model fitting approach is the homogeneous linear predictor model (HLP) based on maximum likelihood (ML) introduced by Lang (2005). We investigate deletion diagnostics as the Cook distance and DBETA for multiple response data using HLPmodels (Lang 2005), which have not been considered yet, and propose a simple "delete=replace" method as an alternative approach for deletion. Methods are compared with the GEE approach. We also discuss the modelling of a repeated multiple response variable, a categorical variable for which subjects can select any number of categories on repeated occasions. Multiple responses have been considered in the literature by various authors; however, repeated multiple responses have not been considered yet. Approaches include the marginal model approach using the GEE and HLP methods, and generalised linear mixed models (GLMM). For the GEE method, we also consider possible correlation structures and propose a groupwise correlation estimation method yielding more efficient parameter estimates if the correlation structure is indeed different for different groups, which is confirmed by a simulation study. Ordered categorical variables occur in many applications and can be seen as a special case of multiple responses. The proportional odds model, which uses logits of cumulative probabilities, is currently the most popular model. We consider two approaches focusing on the mis-specification of a covariate. The binary approach considers the proportional oddsmodel as J-1 logistic regression models and applies the cumulative residual process introduced by Arbogast and Lin (2005) for logistic regression. The multivariate approach views the proportional odds model as a member of the class of multivariate generalised linear models (MGLM), where the response variable is a vector of indicator responses

    Systematics and Biogeography of Orthaea Kloztsch (Ericaceae: Vaccinieae)

    Full text link
    In the first chapter a study of the distribution patterns of the neotropical Vaccinieae (Ericaceae) is presented. Five areas of endemism were recovered: Central America, northern ChocĂł, southern ChocĂł, eastern Ecuador, and Yungas. Divergence times estimations indicate that the Andean clade of Vaccinieae migrated to South America during the Late Oligocene or Early Miocene (28.9-17.84 MA), and most of the subsequent diversification took place during the Tertiary. The Yungas was the first Andean area to be colonized, and several dispersals towards the north expanded their distribution range. Both areas of endemism and dispersal events were influenced by geological processes, such as the raise of the Andes, the western Andean Portal, and the establishment of the Central America isthmus. The second chapter is a phylogenetic analysis of Orthaea Klotzsch (Vaccinieae: Ericaceae) based on molecular and morphological data. Orthaea is polyphyletic, with members evolving independently in several lineages within the tribe Vaccinieae. Currently accepted species of Orthaea s.l. were recovered within four clades: Guiana Shield, Empedoclesia, Thibaudia p.p., and Cavendishia + Orthaea p.p. clades. Most of the Andean species of Orthaea s.l. are closely related to Cavendishia, however, relationships within this clade were not strongly supported. Combined analyses of molecular and morphological data only provided support and uncovered synapomorphies for the Guiana Shield and Empedoclesia clades. Although a core Orthaea s.s. clade was identified, no nomenclatural changes are here proposed for the remainder species of Orthaea s.l. (20 spp.), except for those previously classified in Empedoclesia, a genus that needs to be reinstated. The third chapter is a taxonomic monograph of Orthaea s.s., as delimited in the combined phylogenetic analysis mentioned above. Fifteen species were studied, including two new species (O. eteocles N. R. Salinas and O. fissiflora N. R. Salinas & Pedraza) and a new synonym (O. glandulifera Luteyn = O. oedipus Luteyn)

    Odds ratio for a single 2 Ă— 2 table with correlated binomials for two margins

    No full text
    Bivariate binomial distribution, Log-odds ratio, McNemar’s test, Normal approximation,

    From coherence in theory to coherence in practice : a stock-take of the written, tested and taught National Curriculum Statement for Mathematics (NCSM) at Further Education and Training (FET) level in South Africa.

    Get PDF
    Initiatives in many countries to improve learner performances in mathematics in poor communities have been described as largely unsuccessful mainly due to their cursory treatment of curriculum alignment. Empirical evidence has shown that in high achieving countries the notion of coherence was strongly anchored in cognitively demanding mathematics programs. The view that underpins this study is that a cognitively demanding and coherent mathematics curriculum has potential to level the playing field for the poor and less privileged learners. In South Africa beyond 1994, little has been done to understand the potential of such coherent curriculum in the context of the NCSM. This study examined the levels of cognitive demand and alignment between the written, tested and taught NCSM. The study adopted Critical Theory as its underlying paradigm and used a multiple case study approach. Wilson and Bertenthal’s (2005) dimensions of curriculum coherence provided the theoretical framework while Webb’s (2002) categorical coherence criterion together with Porter’s (2004) Cognitive Demand tools were used to analyse curriculum and assessment documents. Classroom observations of lesson sequences were analysed following Businskas’ (2008) model of forms of mathematical connections since connections of different types form the bases for high cognitive demand (Porter, 2002). The results indicated that higher order cognitive skills and processes are emphasized consistently in the new curriculum documents. However, in the 2008 examination papers the first examinations of the new FET curriculum, lower order cognitive skills and processes appeared to be emphasized, a finding supported by Umalusi (2009) and Edwards (2010). Classroom observations pointed to teachers focusing more on rote learning of both concepts and procedures and less on procedural and conceptual understanding. Given the widespread evidence of the tested curriculum impacting on the taught curriculum, this study suggests that this lack of alignment between the advocated curriculum on one hand, the tested and the taught curricula on the other, needs to be investigated further for it endangers the teaching and learning of higher order cognitive skills and processes in the FET mathematics classrooms for the poor and less privileged. Broader evidence suggests that this would work against efforts towards supporting the upward mobility of poor children in the labour market
    • …
    corecore