174 research outputs found
Aspects of categorical data analysis.
Thesis (M.Sc.)-University of Natal, Durban, 1998.The purpose of this study is to investigate and understand data which are grouped into categories. At the onset, the study presents a review of early research contributions and controversies surrounding categorical data analysis. The concept of sparseness in a contingency table refers to a table where many
cells have small frequencies. Previous research findings showed that incorrect results were obtained in the analysis of sparse tables. Hence, attention is focussed on the effect of sparseness on modelling and analysis of categorical data in this dissertation.
Cressie and Read (1984) suggested a versatile alternative, the power divergence statistic, to statistics proposed in the past. This study includes a detailed discussion of the power-divergence goodness-of-fit statistic with areas of interest covering a review on the minimum power divergence estimation method and evaluation of model fit. The effects of sparseness are also investigated for the power-divergence statistic. Comparative reviews on the accuracy, efficiency and performance of the power-divergence family of statistics under large and small sample cases are presented. Statistical applications on the power-divergence statistic have been conducted in SAS (Statistical Analysis
Software). Further findings on the effect of small expected frequencies on accuracy of the X2 test are presented from the studies of Tate and Hyer (1973) and Lawal and Upton (1976).
Other goodness-of-fit statistics which bear relevance to the sparse multino-mial case are discussed. They include Zelterman's (1987) D2 goodness-of-fit statistic, Simonoff's (1982, 1983) goodness-of-fit statistics as well as Koehler and Larntz's tests for log-linear models. On addressing contradictions for the
sparse sample case under asymptotic conditions and an increase in sample size, discussions are provided on Simonoff's use of nonparametric techniques to find the variances as well as his adoption of the jackknife and bootstrap technique
Recommended from our members
The Graphical Representation of Structured Multivariate Data
During the past two decades or so, graphical representations have been used increasingly for the examination, summarisation and communication of statistical data. Many graphical techniques exist for exploratory data analysis (ie. for deciding which model it is appropriate to fit to the data) and a number of graphical diagnostic techniques exist for checking the appropriateness of a fitted model. However, very few techniques exist for the representation of the fitted model itself. This thesis is concerned with the development of some new and existing graphical representation techniques for the communication and interpretation of fitted statistical models.
The first part of this thesis takes the form of a general overview of the use in statistics of graphical representations for exploratory data analysis and diagnostic model checking. In relation to the concern of this thesis, particular consideration is given to the few graphical techniques which already exist for the representation of fitted models. A number of novel two-dimensional approaches are then proposed which go partway towards providing a graphical representation of the main effects and interaction terms for fitted models. This leads on to a description of conditional independence graphs, and consideration of the suitability of conditional independence graphs as a technique for the representation of fitted models. Conditional independence graphs are then developed further in accordance with the research aims.
Since it becomes apparent that it is not possible to use any of the approaches taken m order to develop a simple two-dimensional pen-and-paper technique for the unambiguous graphical representation of all fitted statistical models, an interactive computer package based on the conditional independence graph approach is developed for the construction, communication and interpretation of graphical representations for fitted statistical models. This package, called the "Conditional Independence Graph Enhancer" (CIGE), does provide unambiguous graphical representations for all fitted statistical models considered
A Psychometric Investigation of a Mathematics Placement Test at a Science, Technology, Engineering, and Mathematics (STEM) Gifted Residential High School
Educational institutions, at all levels, must justify their use of placement testing and confront questions of their impact on students’ educational outcomes to assure all stakeholders that students are being enrolled in courses appropriate with their ability in order to maximize their chances of success (Linn, 1994; Mattern & Packman, 2009; McFate & Olmsted III, 1999; Norman, Medhanie, Harwell, Anderson, & Post, 2011; Wiggins, 1989). The aims of this research were to (1) provide evidence of Content Validity, (2) provide evidence of Construct Validity and Internal Consistency Reliability, (3) examine the item characteristics and potential bias of the items between males and females, and (4) provide evidence of Criterion-Related Validity by investigating the ability of the mathematics placement test scores to predict future performance in an initial mathematics course.
Students’ admissions portfolios and scores from the mathematics placement test were used to examine the aims of this research. Content Validity was evidenced through the use of a card-sorting task by internal and external subject matter experts. Results from Multidimensional Scaling and Hierarchical Cluster Analysis revealed a congruence of approximately 63 percent between the two group configurations. Next, an Exploratory Factor Analysis was used to investigate the underlying factor structure of the mathematics placement test. Findings indicated a three factor structure of PreCalculus, Geometry, and Algebra 1, with moderate correlations between factors.
Thirdly, an item analysis was conducted to explore the item parameters (i.e., item difficulty, and item discrimination) and to test for gender biases. Results from the item analysis suggested that the Algebra 1 and Geometry items were generally easy for the population of interest, while the PreCalculus items presented more of a challenge. Furthermore, the mathematics placement test was optimized by removing eleven items from the Algebra 1 factor and two items from the PreCalculus factor. All Internal Consistency Reliability estimates remained strong and ranged from .736 to .950.
Finally, Hierarchical Multiple Linear Regressions were used to examine the relationship between students’ total and factor scores from the mathematics placement test with students’ performance in their first semester mathematics course. Findings from the four Hierarchical Multiple Linear Regressions demonstrate that the total score students’ receive on the mathematics placement test predicts their achievement in their initial mathematics course, above and beyond the contributions of their demographic information and previous academic background. More specifically, the Algebra 1 Factor Score from the mathematics placement test was the strongest predictor of student success among the lower level mathematics courses (i.e., Mathematical Investigations I or II). Similarly, both the Algebra 1 and PreCalculus Factor Scores from the mathematics placement test were significant predictors of students’ grades in their first upper level mathematics course (i.e., Mathematical Investigations III or IV), providing evidence of Predictive Validity. The current mathematics placement test and procedures appear appropriate for the population of interest given the empirical evidence demonstrated in this research study regarding the psychometric properties of the exam.
The continued use of the revised mathematics placement test in the course placement decision-making process is advisable
Training Manual on Advanced Analytical Tools for Social Science Research Vol.1
Applying appropriate analytical techniques form the backbone of any research endeavor in agriculture,
fisheries, and allied sciences. Without proper knowledge of applying statistical/econometric tools,
software, and derivation of inferences from the same, it would not be possible to gather relevant
interpretations of the investigation. Hence, the importance of well-designed data collection protocol,
analysis, and interpretation cannot be underestimated. Such inferences form the basis of sound policy
planning and resource management. Technology advancements and the development of analytical
software have made the data analysis process less laborious. A basic understanding of the application
of advanced analytical tools and their interpretation increases the productivity and efficiency of social
science researchers engaged in agriculture/animal/fisheries science research. Hence the Winter School
on Advanced Analytical Tools for Social Science Research is designed to enhance the analytical skills
of social science researchers from NARES by allowing them to familiarize with advanced analytical
procedures and their practical applications.
This Winter School is a step towards familiarizing recent analytical techniques in social science to
derive quality research outputs. The course is designed to acquaint the participants with areas such as
exploratory data analysis, sampling techniques, data classificatory techniques, non–parametric
methods, econometric analysis, and time series modeling, etc. Lectures on GIS/Spatial modeling,
scaling techniques, data mining and big data analytics, machine learning techniques, and ecosystem
evaluation have also been touched upon. The course is more practical-oriented, with a greater emphasis
on interpreting the results. It employs a combination of lectures and exercises using statistical software
Recommended from our members
Predictability and constraints on the structure of ecological communities in the context of climate change
Ecologists must increasingly balance the need for accurate predictions about how ecosystems will be affected by climate change, against the fact that making such predictions at the ecosystem-level may be infeasible. Although information about responses of individual species to a changing environment is increasing, scaling such information to the community level is challenging. To date, predicting responses of ecological communities to climate change is constrained by limited theoretical and empirical knowledge about the response of communities and ecosystems to change. My dissertation addresses several knowledge gaps in our understanding of community structure under climate change. This research draws from a rich experimental tradition in the species-diverse model ecosystem of the US Pacific Northwest rocky intertidal to test ecological theory.
In Chapter 2, I assessed whether the response of multiple species of coralline algae to global change could be predicted from basic first principles of chemistry, physiology, and ecology. Given the rate of global change, and the time-consuming process of experimentally determining species responses to climate change, I hypothesized that species can be grouped using existing theory, either by their evolutionary relatedness or by their ecological traits, such that climate responses are similar within a group. Such a scheme would greatly reduce the number of experiments needed to characterize species climate vulnerability, requiring the characterization of the response of groups of species to climate change, rather than individual species. Using a suite of five co-occurring species of intertidal articulated coralline algae (Corallina vancouveriensis, Corallina officinalis, Bossiella plumosa, Bossiella orbiginiana, and Calliarthron tuberculosum), I applied this framework to generate ten mutually exclusive hypotheses that could explain organismal response to ocean acidification, a consequence of global climate change that threatens marine calcifying species. I found that all species had similar responses to ocean acidification, and that responses were generally predicted by the body size of the individual.
Despite the power that such a framework provides in understanding group-level response to climate change, predicting community-level response requires knowledge of how organisms affect one another. In Chapter 3, I quantified species interactions in a series of removal experiments to estimate the reciprocal effects between a canopy-forming intertidal kelp (Saccharina sessilis) and a suite of understory species that persist beneath the kelp canopy. This experiment was replicated in different oceanographic conditions across a large latitudinal gradient, as a step towards understanding how interactions might change with climate change. However, the experiment demonstrated that interactions between the canopy and understory were consistent among different environmental conditions. Furthermore, the strongest effect was that of understory species, particularly articulated coralline turf algae, on the canopy species. The coralline turf algae both facilitated the recruitment of the canopy species and buffered the canopy from abiotic stress during its adult life stage. Combining experimental results and observational surveys, a hypothesized interaction network for these species was constructed, highlighting the importance of direct and indirect species interactions in promoting species coexistence.
A long-standing controversy in ecology is whether or not species interactions can be inferred from observational data, as opposed to from experimental tests. Although the rocky intertidal ecosystem is unique for its ease of experimental manipulation, quantifying species interactions experimentally is often difficult or impossible. As an alternative, many have turned to statistical methods to estimate species interactions from observational data, namely, from patterns in species pairwise co-occurrences. In Chapter 4, I examined these co-occurrence methods and their potential relationship to experimentally measured species interactions. I first used a suite of different co-occurrence methods to generate a set of predicted species interactions of macrophytes and invertebrates from observational surveys conducted in the rocky intertidal zone of Oregon. I then compared the predicted species interactions to the same pairwise species interactions determined experimentally and assembled from the literature. Overall, of the seven methods tested, each generated a different set of predicted species interactions from the same data, and all methods predicted interactions that did not match those in the experimental database. Thus, predicting species interactions from patterns in occurrence remains elusive. Importantly, much work remains to be done to understand the link between species co-occurrences and their actual interactions with one another on the landscape. A key limiting frontier in climate change ecology is determining the influence of species interactions on species distributions across the landscape, and the sensitivity of such interactions to changes in climate.
Finally, in Chapter 5, I used theory from the published literature and knowledge from my previous chapters to make predictions the recovery of low rocky intertidal communities after a disturbance. The process of community development after disturbance has been studied in many ways, from the successional studies of the early 1900s, to modern community assembly theory. In recent years, a focus on the unpredictability of community assembly has emerged, paying particular attention to the role of historical contingency, or priority effects, in determining the recovery trajectory of a community. Priority effects occur when the arrival of a species after a disturbance inalterably changes the composition of the developing community, driving the assembly of widely different communities at a small spatial scale. I conducted a community assembly experiment in three different low intertidal zone community "types", each characterized by different dominant macrophyte species (Saccharina sessilis, Phyllospadix spp., and algal "turfs"). Replicating this experiment at six sites along the Oregon coast, I found that both regional and local dynamics constrain the recovery of communities after disturbance. Half of the time, the community returned to the state of the nearby community type. The remaining communities were influenced by priority effects that could be predicted based on 1) regional dynamics favoring some species over others, or 2) the timing of arrival of important facilitating species.
Overall, understanding the dynamic relationship between the persistence of diverse communities and a changing environment remains one of the challenges of our time. My dissertation highlights some of the challenges in predicting the future composition of communities under climate change, but also provides some ways forward. Integration of experimental, theoretical, and observational studies builds the scaffolding of prediction, whereby understanding the constraints on species physiology, the interactions among species, and community assembly can help frame the context in which predictions are made.Keywords: Rocky Intertidal, Algae, Community Ecology, Ecology, Kelp, Climate Change Ecolog
Feature Screening of Ultrahigh Dimensional Feature Spaces With Applications in Interaction Screening
Data for which the number of predictors exponentially exceeds the number of observations is becoming increasingly prevalent in fields such as bioinformatics, medical imaging, computer vision, And social network analysis. One of the leading questions statisticians must answer when confronted with such “big data” is how to reduce a set of exponentially many predictors down to a set of a mere few predictors which have a truly causative effect on the response being modelled. This process is often referred to as feature screening. In this work we propose three new methods for feature screening. The first method we propose (TC-SIS) is specifically intended for use with data having both categorical response and predictors. The second method we propose (JCIS) is meant for feature screening for interactions between predictors. JCIS is rare among interaction screening methods in that it does not require first finding a set of causative main effects before screening for interactive effects. Our final method (GenCorr) is intended for use with data having a multivariate response. GenCorr is the only method for multivariate screening which can screen for both causative main effects and causative interactions. Each of these aforementioned methods will be shown to possess both theoretical robustness as well as empirical agility
- …