74 research outputs found

    Advanced Data Analysis - Lecture Notes

    Get PDF
    Lecture notes for Advanced Data Analysis (ADA1 Stat 427/527 and ADA2 Stat 428/528), Department of Mathematics and Statistics, University of New Mexico, Fall 2016-Spring 2017. Additional material including RMarkdown templates for in-class and homework exercises, datasets, R code, and video lectures are available on the course websites: https://statacumen.com/teaching/ada1 and https://statacumen.com/teaching/ada2 . Contents I ADA1: Software 0 Introduction to R, Rstudio, and ggplot II ADA1: Summaries and displays, and one-, two-, and many-way tests of means 1 Summarizing and Displaying Data 2 Estimation in One-Sample Problems 3 Two-Sample Inferences 4 Checking Assumptions 5 One-Way Analysis of Variance III ADA1: Nonparametric, categorical, and regression methods 6 Nonparametric Methods 7 Categorical Data Analysis 8 Correlation and Regression IV ADA1: Additional topics 9 Introduction to the Bootstrap 10 Power and Sample size 11 Data Cleaning V ADA2: Review of ADA1 1 R statistical software and review VI ADA2: Introduction to multiple regression and model selection 2 Introduction to Multiple Linear Regression 3 A Taste of Model Selection for Multiple Regression VII ADA2: Experimental design and observational studies 4 One Factor Designs and Extensions 5 Paired Experiments and Randomized Block Experiments 6 A Short Discussion of Observational Studies VIII ADA2: ANCOVA and logistic regression 7 Analysis of Covariance: Comparing Regression Lines 8 Polynomial Regression 9 Discussion of Response Models with Factors and Predictors 10 Automated Model Selection for Multiple Regression 11 Logistic Regression IX ADA2: Multivariate Methods 12 An Introduction to Multivariate Methods 13 Principal Component Analysis 14 Cluster Analysis 15 Multivariate Analysis of Variance 16 Discriminant Analysis 17 Classificationhttps://digitalrepository.unm.edu/unm_oer/1002/thumbnail.jp

    Brain Biochemistry and Personality: A Magnetic Resonance Spectroscopy Study

    Get PDF
    To investigate the biochemical correlates of normal personality we utilized proton magnetic resonance spectroscopy (1H-MRS). Our sample consisted of 60 subjects ranging in age from 18 to 32 (27 females). Personality was assessed with the NEO Five-Factor Inventory (NEO-FFI). We measured brain biochemistry within the precuneus, the cingulate cortex, and underlying white matter. We hypothesized that brain biochemistry within these regions would predict individual differences across major domains of personality functioning. Biochemical models were fit for all personality domains including Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness. Our findings involved differing concentrations of Choline (Cho), Creatine (Cre), and N-acetylaspartate (NAA) in regions both within (i.e., posterior cingulate cortex) and white matter underlying (i.e., precuneus) the Default Mode Network (DMN). These results add to an emerging literature regarding personality neuroscience, and implicate biochemical integrity within the default mode network as constraining major personality domains within normal human subjects

    Simulating High-Dimensional Multivariate Data using the bigsimr R Package

    Full text link
    It is critical to accurately simulate data when employing Monte Carlo techniques and evaluating statistical methodology. Measurements are often correlated and high dimensional in this era of big data, such as data obtained in high-throughput biomedical experiments. Due to the computational complexity and a lack of user-friendly software available to simulate these massive multivariate constructions, researchers resort to simulation designs that posit independence or perform arbitrary data transformations. To close this gap, we developed the Bigsimr Julia package with R and Python interfaces. This paper focuses on the R interface. These packages empower high-dimensional random vector simulation with arbitrary marginal distributions and dependency via a Pearson, Spearman, or Kendall correlation matrix. bigsimr contains high-performance features, including multi-core and graphical-processing-unit-accelerated algorithms to estimate correlation and compute the nearest correlation matrix. Monte Carlo studies quantify the accuracy and scalability of our approach, up to d=10,000d=10,000. We describe example workflows and apply to a high-dimensional data set -- RNA-sequencing data obtained from breast cancer tumor samples.Comment: 22 pages, 10 figures, https://cran.r-project.org/web/packages/bigsimr/index.htm

    Lung Toxicity of Ambient Particulate Matter from Southeastern U.S. Sites with Different Contributing Sources: Relationships between Composition and Effects

    Get PDF
    BACKGROUND: Exposure to air pollution and, more specifically, particulate matter (PM) is associated with adverse health effects. However, the specific PM characteristics responsible for biological effects have not been defined. OBJECTIVES: In this project we examined the composition, sources, and relative toxicity of samples of PM with aerodynamic diameter ≥2.5 μm (PM(2.5)) collected from sites within the Southeastern Aerosol Research and Characterization (SEARCH) air monitoring network during two seasons. These sites represent four areas with differing sources of PM(2.5), including local urban versus regional sources, urban areas with different contributions of transportation and industrial sources, and a site influenced by Gulf of Mexico weather patterns. METHODS: We collected samples from each site during the winter and summer of 2004 for toxicity testing and for chemical analysis and chemical mass balance–based source apportionment. We also collected PM(2.5) downwind of a series of prescribed forest burns. We assessed the toxicity of the samples by instillation into rat lungs and assessed general toxicity, acute cytotoxicity, and inflammation. Statistical dose–response modeling techniques were used to rank the relative toxicity and compare the seasonal differences at each site. Projection-to-latent-surfaces (PLS) techniques examined the relationships among sources, chemical composition, and toxicologic end points. RESULTS AND CONCLUSIONS: Urban sites with high contributions from vehicles and industry were most toxic

    Comparison of Methods for Classifying Persistent Post-Concussive Symptoms in Children

    Get PDF
    Pediatric mild traumatic brain injury (pmTBI) has received increased public scrutiny over the past decade, especially regarding children who experience persistent post-concussive symptoms (PPCS). However, several methods for defining PPCS exist in clinical and scientific literature, and even healthy children frequently exhibit non-specific, concussive-like symptoms. Inter-method agreement (six PPCS methods), observed misclassification rates, and other psychometric properties were examined in large cohorts of consecutively recruited adolescent patients with pmTBI (n = 162) 1 week and 4 months post-injury and in age/sex-matched healthy controls (HC; n = 117) at equivalent time intervals. Six published PPCS methods were stratified into Simple Change (e.g., International Statistical Classification of Diseases and Related Health Problems, 10th revision [ICD-10]) and Standardized Change (e.g., reliable change indices) algorithms. Among HC, test-retest reliability was fair to good across the 4-month assessment window, with evidence of bias (i.e., higher symptom ratings) during retrospective relative to other assessments. Misclassification rates among HC were higher (>30%) for Simple Change algorithms, with poor inter-rater reliability of symptom burden across HC and their parents. A 49% spread existed in terms of the proportion of pmTBI patients "diagnosed" with PPCS at 4 months, with superior inter-method agreement among standardized change algorithms. In conclusion, the self-reporting of symptom burden is only modestly reliable in typically developing adolescents over a 4-month period, with additional evidence for systematic bias in both adolescent and parental ratings. Significant variation existed for identifying pmTBI patients who had "recovered" (i.e., those who did not meet individual criteria for PPCS) from concussion across the six definitions, representing a considerable challenge for estimating the true incidence rate of PPCS in published literature. Although relatively straightforward to obtain, current findings question the utility of the most commonly used Simple Change scores for diagnosis of PPCS in clinical settings

    Choosing a Cluster Sampling Design for Lot Quality Assurance Sampling Surveys

    No full text
    Lot quality assurance sampling (LQAS) surveys are commonly used for monitoring and evaluation in resource-limited settings. Recently several methods have been proposed to combine LQAS with cluster sampling for more timely and cost-effective data collection. For some of these methods, the standard binomial model can be used for constructing decision rules as the clustering can be ignored. For other designs, considered here, clustering is accommodated in the design phase. In this paper, we compare these latter cluster LQAS methodologies and provide recommendations for choosing a cluster LQAS design. We compare technical differences in the three methods and determine situations in which the choice of method results in a substantively different design. We consider two different aspects of the methods: the distributional assumptions and the clustering parameterization. Further, we provide software tools for implementing each method and clarify misconceptions about these designs in the literature. We illustrate the differences in these methods using vaccination and nutrition cluster LQAS surveys as example designs. The cluster methods are not sensitive to the distributional assumptions but can result in substantially different designs (sample sizes) depending on the clustering parameterization. However, none of the clustering parameterizations used in the existing methods appears to be consistent with the observed data, and, consequently, choice between the cluster LQAS methods is not straightforward. Further research should attempt to characterize clustering patterns in specific applications and provide suggestions for best-practice cluster LQAS designs on a setting-specific basis

    Comparing the Pezzoli, Hund, and Hedt designs for <i>σ</i> = .1, <i>α</i> = <i>β</i> = .1, and <i>m</i> = 10.

    No full text
    <p>Comparing the Pezzoli, Hund, and Hedt designs for <i>σ</i> = .1, <i>α</i> = <i>β</i> = .1, and <i>m</i> = 10.</p

    Top panel: Estimated <i>σ</i> and <i>ρ</i> as a function of p^j for 20 areas in Greenland <i>et. al</i> [12], with loess smooth overlayed.

    No full text
    <p>Bottom panel: Solid line represents OC curve for <i>n</i> = 60, <i>d</i> = 50 for <i>p</i><sub><i>l</i></sub> = .75, <i>p</i><sub><i>u</i></sub> = .9 design when <i>σ</i> and <i>ρ</i> are fixed at the mean value of </p><p></p><p></p><p></p><p><mi>σ</mi><mo>^</mo></p><p></p><p></p><p></p> (left) and <p></p><p></p><p></p><p><mi>ρ</mi><mo>^</mo></p><p></p><p></p><p></p> (right). Dashed line represents OC curve when <i>σ</i> and <i>ρ</i> vary over <i>p</i> according to the predicted loess smooth of <p></p><p></p><p></p><p><mi>σ</mi><mo>^</mo></p><p></p><p></p><p></p> (left) and <p></p><p></p><p></p><p><mi>ρ</mi><mo>^</mo></p><p></p><p></p><p></p> (right).<p></p
    corecore