169 research outputs found

    Advanced Data Analysis - Lecture Notes

    Get PDF
    Lecture notes for Advanced Data Analysis (ADA1 Stat 427/527 and ADA2 Stat 428/528), Department of Mathematics and Statistics, University of New Mexico, Fall 2016-Spring 2017. Additional material including RMarkdown templates for in-class and homework exercises, datasets, R code, and video lectures are available on the course websites: https://statacumen.com/teaching/ada1 and https://statacumen.com/teaching/ada2 . Contents I ADA1: Software 0 Introduction to R, Rstudio, and ggplot II ADA1: Summaries and displays, and one-, two-, and many-way tests of means 1 Summarizing and Displaying Data 2 Estimation in One-Sample Problems 3 Two-Sample Inferences 4 Checking Assumptions 5 One-Way Analysis of Variance III ADA1: Nonparametric, categorical, and regression methods 6 Nonparametric Methods 7 Categorical Data Analysis 8 Correlation and Regression IV ADA1: Additional topics 9 Introduction to the Bootstrap 10 Power and Sample size 11 Data Cleaning V ADA2: Review of ADA1 1 R statistical software and review VI ADA2: Introduction to multiple regression and model selection 2 Introduction to Multiple Linear Regression 3 A Taste of Model Selection for Multiple Regression VII ADA2: Experimental design and observational studies 4 One Factor Designs and Extensions 5 Paired Experiments and Randomized Block Experiments 6 A Short Discussion of Observational Studies VIII ADA2: ANCOVA and logistic regression 7 Analysis of Covariance: Comparing Regression Lines 8 Polynomial Regression 9 Discussion of Response Models with Factors and Predictors 10 Automated Model Selection for Multiple Regression 11 Logistic Regression IX ADA2: Multivariate Methods 12 An Introduction to Multivariate Methods 13 Principal Component Analysis 14 Cluster Analysis 15 Multivariate Analysis of Variance 16 Discriminant Analysis 17 Classificationhttps://digitalrepository.unm.edu/unm_oer/1002/thumbnail.jp

    In Situ and Satellite Measured Temperature Comparability

    Get PDF
    Following the International Geophysical Year in the late 1950's, small meteorological rockets caught the interest of scientists as a potentially inexpensive method to obtain meteorological information (density, temperature, wind) above balloon-borne radiosonde altitudes. These small rocketsondes have served many important observational roles in terms of studies conducted of atmospheric structure and processes, enabling many new ideas about the atmosphere to emerge. Although no longer manufactured a small residual inventory of meteorological rocketsondes exist for specific research projects. The value of data from meteorological rocketsondes is without question but with their disappearance data from many different satellites are filling the need, some able to resolve high-altitude temperatures quite well. However, the rocketsonde vertical profile is more localized to the launch site whereas satellites move several kilometers per second. The objective of this presentation is to compare in situ temperature data with remotely measured/retrieved temperature data. There have been a number of U.S. conducted missions utilizing the passive falling sphere data that we use to verify the comparability of retrieved temperatures from these satellites. Missions, some as early as 1991, were conducted in polar, equatorial, and mid-latitude locations. An important aspect is that a single satellite profile compared to a falling sphere profile often does not agree while high density satellite measurements when averaged over an area near the rocketsonde data area seems to be in better agreement. Radiosonde temperature data are used in the analysis when appropriat

    Individual differences in toddlers' social understanding and prosocial behavior: Disposition or socialization?

    Get PDF
    We examined how individual differences in social understanding contribute to variability in early-appearing prosocial behavior. Moreover, potential sources of variability in social understanding were explored and examined as additional possible predictors of prosocial behavior. Using a multi-method approach with both observed and parent-report measures, 325 children aged 18-30 months were administered measures of social understanding (e.g., use of emotion words; self-understanding), prosocial behavior (in separate tasks measuring instrumental helping, empathic helping, and sharing, as well as parent-reported prosociality at home), temperament (fearfulness, shyness, and social fear), and parental socialization of prosocial behavior in the family. Individual differences in social understanding predicted variability in empathic helping and parent-reported prosociality, but not instrumental helping or sharing. Parental socialization of prosocial behavior was positively associated with toddlers' social understanding, prosocial behavior at home, and instrumental helping in the lab, and negatively associated with sharing (possibly reflecting parents' increased efforts to encourage children who were less likely to share). Further, socialization moderated the association between social understanding and prosocial behavior, such that social understanding was less predictive of prosocial behavior among children whose parents took a more active role in socializing their prosociality. None of the dimensions of temperament was associated with either social understanding or prosocial behavior. Parental socialization of prosocial behavior is thus an important source of variability in children's early prosociality, acting in concert with early differences in social understanding, with different patterns of influence for different subtypes of prosocial behavior

    Comparison of Temperature Measurements in the Middle Atmosphere by Satellite with Profiles Obtained by Meteorological Rockets

    Get PDF
    Measurements using the inflatable falling sphere technique have occasionally been used to obtain temperature results from density data and thereby provide comparison with temperature profiles obtained by satellite sounders in the mesosphere and stratosphere. To insure density measurements within narrow time frames and close in space, the inflatable falling sphere is launched within seconds of the nearly overhead satellite pass. Sphere measurements can be used to validate remotely measured temperatures but also have the advantage of measuring small-scale atmospheric features. Even so, with the dearth of remaining falling spheres available (the manufacture of these systems has been discontinued), it may be time to consider whether the remote measurements are mature enough to stand alone. Three field studies are considered, one in 2003 from Northern Sweden, and two in 2010 from the vicinity of Kwajalein Atoll in the South Pacific and from Barking Sands, Hawaii. All three sites are used to compare temperature retrievals between satellite and in situ falling spheres. The major satellite instruments employed are SABER, MLS, and AIRS. The comparisons indicate that remotely measured temperatures mimic the sphere temperature measurements quite well. The data also confirm that satellite retrievals, while not always at the exact location required for detailed studies in space and time, compare sufficiently well to be highly useful. Although the falling sphere will provide a measurement at a specific location and time, satellites only pass a given location daily or less frequently. This report reveals that averaged satellite measurements can provide temperatures and densities comparable to those obtained from the falling sphere, thereby providing a reliable measure of global temperatur

    Evaluating performance of biomedical image retrieval systems - an overview of the medical image retrieval task at ImageCLEF 2004-2013

    Get PDF
    Medical image retrieval and classification have been extremely active research topics over the past 15 years. Within the ImageCLEF benchmark in medical image retrieval and classification, a standard test bed was created that allows researchers to compare their approaches and ideas on increasingly large and varied data sets including generated ground truth. This article describes the lessons learned in ten evaluation campaigns. A detailed analysis of the data also highlights the value of the resources created

    Brain Biochemistry and Personality: A Magnetic Resonance Spectroscopy Study

    Get PDF
    To investigate the biochemical correlates of normal personality we utilized proton magnetic resonance spectroscopy (1H-MRS). Our sample consisted of 60 subjects ranging in age from 18 to 32 (27 females). Personality was assessed with the NEO Five-Factor Inventory (NEO-FFI). We measured brain biochemistry within the precuneus, the cingulate cortex, and underlying white matter. We hypothesized that brain biochemistry within these regions would predict individual differences across major domains of personality functioning. Biochemical models were fit for all personality domains including Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness. Our findings involved differing concentrations of Choline (Cho), Creatine (Cre), and N-acetylaspartate (NAA) in regions both within (i.e., posterior cingulate cortex) and white matter underlying (i.e., precuneus) the Default Mode Network (DMN). These results add to an emerging literature regarding personality neuroscience, and implicate biochemical integrity within the default mode network as constraining major personality domains within normal human subjects

    Simulating High-Dimensional Multivariate Data using the bigsimr R Package

    Full text link
    It is critical to accurately simulate data when employing Monte Carlo techniques and evaluating statistical methodology. Measurements are often correlated and high dimensional in this era of big data, such as data obtained in high-throughput biomedical experiments. Due to the computational complexity and a lack of user-friendly software available to simulate these massive multivariate constructions, researchers resort to simulation designs that posit independence or perform arbitrary data transformations. To close this gap, we developed the Bigsimr Julia package with R and Python interfaces. This paper focuses on the R interface. These packages empower high-dimensional random vector simulation with arbitrary marginal distributions and dependency via a Pearson, Spearman, or Kendall correlation matrix. bigsimr contains high-performance features, including multi-core and graphical-processing-unit-accelerated algorithms to estimate correlation and compute the nearest correlation matrix. Monte Carlo studies quantify the accuracy and scalability of our approach, up to d=10,000d=10,000. We describe example workflows and apply to a high-dimensional data set -- RNA-sequencing data obtained from breast cancer tumor samples.Comment: 22 pages, 10 figures, https://cran.r-project.org/web/packages/bigsimr/index.htm

    Comparison of Methods for Classifying Persistent Post-Concussive Symptoms in Children

    Get PDF
    Pediatric mild traumatic brain injury (pmTBI) has received increased public scrutiny over the past decade, especially regarding children who experience persistent post-concussive symptoms (PPCS). However, several methods for defining PPCS exist in clinical and scientific literature, and even healthy children frequently exhibit non-specific, concussive-like symptoms. Inter-method agreement (six PPCS methods), observed misclassification rates, and other psychometric properties were examined in large cohorts of consecutively recruited adolescent patients with pmTBI (n = 162) 1 week and 4 months post-injury and in age/sex-matched healthy controls (HC; n = 117) at equivalent time intervals. Six published PPCS methods were stratified into Simple Change (e.g., International Statistical Classification of Diseases and Related Health Problems, 10th revision [ICD-10]) and Standardized Change (e.g., reliable change indices) algorithms. Among HC, test-retest reliability was fair to good across the 4-month assessment window, with evidence of bias (i.e., higher symptom ratings) during retrospective relative to other assessments. Misclassification rates among HC were higher (>30%) for Simple Change algorithms, with poor inter-rater reliability of symptom burden across HC and their parents. A 49% spread existed in terms of the proportion of pmTBI patients "diagnosed" with PPCS at 4 months, with superior inter-method agreement among standardized change algorithms. In conclusion, the self-reporting of symptom burden is only modestly reliable in typically developing adolescents over a 4-month period, with additional evidence for systematic bias in both adolescent and parental ratings. Significant variation existed for identifying pmTBI patients who had "recovered" (i.e., those who did not meet individual criteria for PPCS) from concussion across the six definitions, representing a considerable challenge for estimating the true incidence rate of PPCS in published literature. Although relatively straightforward to obtain, current findings question the utility of the most commonly used Simple Change scores for diagnosis of PPCS in clinical settings

    Lung Toxicity of Ambient Particulate Matter from Southeastern U.S. Sites with Different Contributing Sources: Relationships between Composition and Effects

    Get PDF
    BACKGROUND: Exposure to air pollution and, more specifically, particulate matter (PM) is associated with adverse health effects. However, the specific PM characteristics responsible for biological effects have not been defined. OBJECTIVES: In this project we examined the composition, sources, and relative toxicity of samples of PM with aerodynamic diameter ≥2.5 μm (PM(2.5)) collected from sites within the Southeastern Aerosol Research and Characterization (SEARCH) air monitoring network during two seasons. These sites represent four areas with differing sources of PM(2.5), including local urban versus regional sources, urban areas with different contributions of transportation and industrial sources, and a site influenced by Gulf of Mexico weather patterns. METHODS: We collected samples from each site during the winter and summer of 2004 for toxicity testing and for chemical analysis and chemical mass balance–based source apportionment. We also collected PM(2.5) downwind of a series of prescribed forest burns. We assessed the toxicity of the samples by instillation into rat lungs and assessed general toxicity, acute cytotoxicity, and inflammation. Statistical dose–response modeling techniques were used to rank the relative toxicity and compare the seasonal differences at each site. Projection-to-latent-surfaces (PLS) techniques examined the relationships among sources, chemical composition, and toxicologic end points. RESULTS AND CONCLUSIONS: Urban sites with high contributions from vehicles and industry were most toxic

    Kernel based methods for accelerated failure time model with ultra-high dimensional data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Most genomic data have ultra-high dimensions with more than 10,000 genes (probes). Regularization methods with <it>L</it><sub>1 </sub>and <it>L<sub>p </sub></it>penalty have been extensively studied in survival analysis with high-dimensional genomic data. However, when the sample size <it>n </it>≪ <it>m </it>(the number of genes), directly identifying a small subset of genes from ultra-high (<it>m </it>> 10, 000) dimensional data is time-consuming and not computationally efficient. In current microarray analysis, what people really do is select a couple of thousands (or hundreds) of genes using univariate analysis or statistical tests, and then apply the LASSO-type penalty to further reduce the number of disease associated genes. This two-step procedure may introduce bias and inaccuracy and lead us to miss biologically important genes.</p> <p>Results</p> <p>The accelerated failure time (AFT) model is a linear regression model and a useful alternative to the Cox model for survival analysis. In this paper, we propose a nonlinear kernel based AFT model and an efficient variable selection method with adaptive kernel ridge regression. Our proposed variable selection method is based on the kernel matrix and dual problem with a much smaller <it>n </it>× <it>n </it>matrix. It is very efficient when the number of unknown variables (genes) is much larger than the number of samples. Moreover, the primal variables are explicitly updated and the sparsity in the solution is exploited.</p> <p>Conclusions</p> <p>Our proposed methods can simultaneously identify survival associated prognostic factors and predict survival outcomes with ultra-high dimensional genomic data. We have demonstrated the performance of our methods with both simulation and real data. The proposed method performs superbly with limited computational studies.</p
    corecore