25 research outputs found

    Permutation distribution clustering and structural equation model trees

    Get PDF
    The primary goal of this thesis is to present novel methodologies for the exploratory analysis of psychological data sets that support researchers in informed theory development. Psychological data analysis bears a long tradition of confirming hypotheses generated prior to data collection. However, in practical research, the following two situations are commonly observed: In the first instance, there are no initial hypotheses about the data. In that case, there is no model available and one has to resort to uninformed methods to reveal structure in the data. In the second instance, existing models that reflect prior hypotheses need to be extended and improved, thereby altering and renewing hypotheses about the data and refining descriptions of the observed phenomena. This dissertation introduces a novel method for the exploratory analysis of psychological data sets for each of the two situations. Both methods focus on time series analysis, which is particularly interesting for the analysis of psychophysiological data and longitudinal data typically collected by developmental psychologists. Nonetheless, the methods are generally applicable and useful for other fields that analyze time series data, e.g., sociology, economics, neuroscience, and genetics. The first part of the dissertation proposes a clustering method for time series. A dissimilarity measure of time series based on the permutation distribution is developed. Employing this measure in a hierarchical scheme allows for a novel clustering method for time series based on their relative complexity: Permutation Distribution Clustering (PDC). Two methods for the determination of the number of distinct clusters are discussed based on a statistical and an information-theoretic criterion. Structural Equation Models (SEMs) constitute a versatile modeling technique, which is frequently employed in psychological research. The second part of the dissertation introduces an extension of SEMs to Structural Equation Modeling Trees (SEM Trees). SEM Trees describe partitions of a covariate-space which explain differences in the model parameters. They can provide solutions in situations in which hypotheses in the form of a model exist but may potentially be refined by integrating other variables. By harnessing the full power of SEM, they represent a general data analysis technique that can be used for both time series and non-time series data. SEM Trees algorithmically refine initial models of the sample and thus support researchers in theory development. This thesis includes demonstrations of the methods on simulated as well as on real data sets, including applications of SEM Trees to longitudinal models of cognitive development and cross-sectional cognitive factor models, and applications of PDC on psychophysiological data, including electroencephalographic, electrocardiographic, and genetic data.Ziel dieser Arbeit ist der Entwurf von explorativen Analysemethoden für Datensätze aus der Psychologie, um Wissenschaftler bei der Entwicklung fundierter Theorien zu unterstützen. Die Arbeit ist motiviert durch die Beobachtung, dass die klassischen Auswertungsmethoden für psychologische Datensätze auf der Tradition gründen, Hypothesen zu testen, die vor der Datenerhebung aufgestellt wurden. Allerdings treten die folgenden beiden Situationen im Alltag der Datenauswertung häufig auf: (1) es existieren keine Hypothesen über die Daten und damit auch kein Modelle. Der Wissenschaftler muss also auf uninformierte Methoden zurückgreifen, um Strukturen und Ähnlichkeiten in den Daten aufzudecken. (2) Modelle sind vorhanden, die Hypothesen über die Daten widerspiegeln, aber die Stichprobe nur unzureichend abbilden. In diesen Fällen müssen die existierenden Modelle und damit Hypothesen verändert und erweitert werden, um die Beschreibung der beobachteten Phänomene zu verfeinern. Die vorliegende Dissertation führt für beide Fälle je eine neue Methode ein, die auf die explorative Analyse psychologischer Daten zugeschnitten ist. Gleichwohl sind beide Methoden für alle Bereiche nützlich, in denen Zeitreihendaten analysiert werden, wie z.B. in der Soziologie, den Wirtschaftswissenschaften, den Neurowissenschaften und der Genetik. Der erste Teil der Arbeit schlägt ein Clusteringverfahren für Zeitreihen vor. Dieses basiert auf einem Ähnlichkeitsmaß zwischen Zeitreihen, das auf die Permutationsverteilung der eingebetteten Zeitreihen zurückgeht. Dieses Maß wird mit einem hierarchischen Clusteralgorithmus kombiniert, um Zeitreihen nach ihrer Komplexität in homogene Gruppen zu ordnen. Auf diese Weise entsteht die neue Methode der Permutationsverteilungs-basierten Clusteranalyse (PDC). Zwei Methoden zur Bestimmung der Anzahl von separaten Clustern werden hergeleitet, einmal auf Grundlage von statistischen Tests und einmal basierend auf informationstheoretischen Kriterien. Der zweite Teil der Arbeit erweitert Strukturgleichungsmodelle (SEM), eine vielseitige Modellierungstechnik, die in der Psychologie weit verbreitet ist, zu Strukturgleichungsmodell-Bäumen (SEM Trees). SEM Trees beschreiben rekursive Partitionen eines Raumes beobachteter Variablen mit maximalen Unterschieden in den Modellparametern eines SEMs. In Situationen, in denen Hypothesen in Form eines Modells existieren, können SEM Trees sie verfeinern, indem sie automatisch Variablen finden, die Unterschiede in den Modellparametern erklären. Durch die hohe Flexibilität von SEMs, können eine Vielzahl verschiedener Modelle mit SEM Trees erweitert werden. Die Methode eignet sich damit für die Analyse sowohl von Zeitreihen als auch von Nicht-Zeitreihen. SEM Trees verfeinern algorithmisch anfängliche Hypothesen und unterstützen Forscher in der Weiterentwicklung ihrer Theorien. Die vorliegende Arbeit beinhaltet Demonstrationen der vorgeschlagenen Methoden auf realen Datensätzen, darunter Anwendungen von SEM Trees auf einem längsschnittlichen Wachstumsmodell kognitiver Fähigkeiten und einem querschnittlichen kognitiven Faktor Modell, sowie Anwendungen des PDC auf verschiedenen psychophsyiologischen Zeitreihen

    Terminal Decline in Well-Being: The Role of Multi-Indicator Constellations of Physical Health and Psychosocial Correlates

    No full text
    Well-being is often relatively stable across adulthood and old age, but typically exhibits pronounced deteriorations and vast individual differences in the terminal phase of life. However, the factors contributing to these differences are not well understood. Using up to 25-year annual longitudinal data obtained from 4,404 now-deceased participants of the nationwide German Socio-Economic Panel Study (SOEP; age at death: M = 73.2 years; SD = 14.3 years; 52% women), we explored the role of multi-indicator constellations of socio-demographic variables, physical health and burden factors, and psychosocial characteristics. Expanding earlier reports, Structural Equation Model Trees (SEM Trees) allowed us to identify profiles of variables that were associated with differences in the shape of late-life well-being trajectories. Physical health factors were found to play a major role for well-being decline, but in interaction with psychosocial characteristics such as social participation. To illustrate, for people with low social participation, disability emerged as the strongest correlate of differences in late-life well-being trajectories. However, for people with high social participation, whether or not an individual had spent considerable time in the hospital differentiated high vs. low and stable vs. declining latelife well-being. We corroborated these results with Variable Importance measures derived from a set of resampled SEM Trees (so-called SEM forests) that provide robust and comparative indicators of the total interactive effects of variables for differential late-life well-being. We discuss benefits and limitations of our approach and consider our findings in the context of other reports about protective factors against terminal decline in well-being

    Terminal Decline in Well-Being: The Role of Multi-Indicator Constellations of Physical Health and Psychosocial Correlates

    No full text
    Well-being is often relatively stable across adulthood and old age, but typically exhibits pronounced deteriorations and vast individual differences in the terminal phase of life. However, the factors contributing to these differences are not well understood. Using up to 25-year annual longitudinal data obtained from 4,404 now-deceased participants of the nationwide German Socio-Economic Panel Study (SOEP; age at death: M = 73.2 years; SD = 14.3 years; 52% women), we explored the role of multi-indicator constellations of socio-demographic variables, physical health and burden factors, and psychosocial characteristics. Expanding earlier reports, Structural Equation Model Trees (SEM Trees) allowed us to identify profiles of variables that were associated with differences in the shape of late-life well-being trajectories. Physical health factors were found to play a major role for well-being decline, but in interaction with psychosocial characteristics such as social participation. To illustrate, for people with low social participation, disability emerged as the strongest correlate of differences in late-life well-being trajectories. However, for people with high social participation, whether or not an individual had spent considerable time in the hospital differentiated high vs. low and stable vs. declining latelife well-being. We corroborated these results with Variable Importance measures derived from a set of resampled SEM Trees (so-called SEM forests) that provide robust and comparative indicators of the total interactive effects of variables for differential late-life well-being. We discuss benefits and limitations of our approach and consider our findings in the context of other reports about protective factors against terminal decline in well-being

    Permutationsverteilungsbasierte Clusteranalyse und Strukturgleichungsmodellbäume

    No full text
    The primary goal of this thesis is to present novel methodologies for the exploratory analysis of psychological data sets that support researchers in informed theory development. Psychological data analysis bears a long tradition of confirming hypotheses generated prior to data collection. However, in practical research, the following two situations are commonly observed: In the first instance, there are no initial hypotheses about the data. In that case, there is no model available and one has to resort to uninformed methods to reveal structure in the data. In the second instance, existing models that reflect prior hypotheses need to be extended and improved, thereby altering and renewing hypotheses about the data and refining descriptions of the observed phenomena. This dissertation introduces a novel method for the exploratory analysis of psychological data sets for each of the two situations. Both methods focus on time series analysis, which is particularly interesting for the analysis of psychophysiological data and longitudinal data typically collected by developmental psychologists. Nonetheless, the methods are generally applicable and useful for other fields that analyze time series data, e.g., sociology, economics, neuroscience, and genetics. The first part of the dissertation proposes a clustering method for time series. A dissimilarity measure of time series based on the permutation distribution is developed. Employing this measure in a hierarchical scheme allows for a novel clustering method for time series based on their relative complexity: Permutation Distribution Clustering (PDC). Two methods for the determination of the number of distinct clusters are discussed based on a statistical and an information-theoretic criterion. Structural Equation Models (SEMs) constitute a versatile modeling technique, which is frequently employed in psychological research. The second part of the dissertation introduces an extension of SEMs to Structural Equation Modeling Trees (SEM Trees). SEM Trees describe partitions of a covariate-space which explain differences in the model parameters. They can provide solutions in situations in which hypotheses in the form of a model exist but may potentially be refined by integrating other variables. By harnessing the full power of SEM, they represent a general data analysis technique that can be used for both time series and non-time series data. SEM Trees algorithmically refine initial models of the sample and thus support researchers in theory development. This thesis includes demonstrations of the methods on simulated as well as on real data sets, including applications of SEM Trees to longitudinal models of cognitive development and cross-sectional cognitive factor models, and applications of PDC on psychophysiological data, including electroencephalographic, electrocardiographic, and genetic data.Ziel dieser Arbeit ist der Entwurf von explorativen Analysemethoden für Datensätze aus der Psychologie, um Wissenschaftler bei der Entwicklung fundierter Theorien zu unterstützen. Die Arbeit ist motiviert durch die Beobachtung, dass die klassischen Auswertungsmethoden für psychologische Datensätze auf der Tradition gründen, Hypothesen zu testen, die vor der Datenerhebung aufgestellt wurden. Allerdings treten die folgenden beiden Situationen im Alltag der Datenauswertung häufig auf: (1) es existieren keine Hypothesen über die Daten und damit auch kein Modelle. Der Wissenschaftler muss also auf uninformierte Methoden zurückgreifen, um Strukturen und Ähnlichkeiten in den Daten aufzudecken. (2) Modelle sind vorhanden, die Hypothesen über die Daten widerspiegeln, aber die Stichprobe nur unzureichend abbilden. In diesen Fällen müssen die existierenden Modelle und damit Hypothesen verändert und erweitert werden, um die Beschreibung der beobachteten Phänomene zu verfeinern. Die vorliegende Dissertation führt für beide Fälle je eine neue Methode ein, die auf die explorative Analyse psychologischer Daten zugeschnitten ist. Gleichwohl sind beide Methoden für alle Bereiche nützlich, in denen Zeitreihendaten analysiert werden, wie z.B. in der Soziologie, den Wirtschaftswissenschaften, den Neurowissenschaften und der Genetik. Der erste Teil der Arbeit schlägt ein Clusteringverfahren für Zeitreihen vor. Dieses basiert auf einem Ähnlichkeitsmaß zwischen Zeitreihen, das auf die Permutationsverteilung der eingebetteten Zeitreihen zurückgeht. Dieses Maß wird mit einem hierarchischen Clusteralgorithmus kombiniert, um Zeitreihen nach ihrer Komplexität in homogene Gruppen zu ordnen. Auf diese Weise entsteht die neue Methode der Permutationsverteilungs-basierten Clusteranalyse (PDC). Zwei Methoden zur Bestimmung der Anzahl von separaten Clustern werden hergeleitet, einmal auf Grundlage von statistischen Tests und einmal basierend auf informationstheoretischen Kriterien. Der zweite Teil der Arbeit erweitert Strukturgleichungsmodelle (SEM), eine vielseitige Modellierungstechnik, die in der Psychologie weit verbreitet ist, zu Strukturgleichungsmodell-Bäumen (SEM Trees). SEM Trees beschreiben rekursive Partitionen eines Raumes beobachteter Variablen mit maximalen Unterschieden in den Modellparametern eines SEMs. In Situationen, in denen Hypothesen in Form eines Modells existieren, können SEM Trees sie verfeinern, indem sie automatisch Variablen finden, die Unterschiede in den Modellparametern erklären. Durch die hohe Flexibilität von SEMs, können eine Vielzahl verschiedener Modelle mit SEM Trees erweitert werden. Die Methode eignet sich damit für die Analyse sowohl von Zeitreihen als auch von Nicht-Zeitreihen. SEM Trees verfeinern algorithmisch anfängliche Hypothesen und unterstützen Forscher in der Weiterentwicklung ihrer Theorien. Die vorliegende Arbeit beinhaltet Demonstrationen der vorgeschlagenen Methoden auf realen Datensätzen, darunter Anwendungen von SEM Trees auf einem längsschnittlichen Wachstumsmodell kognitiver Fähigkeiten und einem querschnittlichen kognitiven Faktor Modell, sowie Anwendungen des PDC auf verschiedenen psychophsyiologischen Zeitreihen

    Automated Reproducibility Testing in R Markdown

    No full text
    Computational results are considered _reproducible_ if the same computation on the same data yields the same results if performed on a different computer or on the same computer later in time. Reproducibility is a prerequisite for replicable, robust and transparent research in digital environments. Various approaches have been suggested to increase chances of reproducibility. Many of them rely on R Markdown as a language to dynamically generate reproducible research assets (e.g., reports, posters, or presentations). However, a simple way to detect non-reproducibility, that is, unwanted changes in these assets over time is still missing. We introduce the R package `reproducibleRchunks`, which provides a new type of code chunk in R Markdown documents, which automatically stores meta data about original computational results and verifies later reproduction attempts. With a minimal change to users' workflows, we hope that this approach increases transparency and trustworthiness of digital research assets

    A Reproducible Data Analysis Workflow with R Markdown, Git, Make, and Docker

    No full text
    In this tutorial, we describe a workflow to ensure long-term reproducibility of R-based data analyses. The workflow leverages established tools and practices from software engineering. It combines the benefits of various open-source software tools including R Markdown, Git, Make, and Docker, whose interplay ensures seamless integration of version management, dynamic report generation conforming to various journal styles, and full cross-platform and long-term computational reproducibility. The workflow ensures meeting the primary goals that 1) the reporting of statistical results is consistent with the actual statistical results (dynamic report generation), 2) the analysis exactly reproduces at a later point in time even if the computing platform or software is changed (computational reproducibility), and 3) changes at any time (during development and post-publication) are tracked, tagged, and documented while earlier versions of both data and code remain accessible. While the research community increasingly recognizes dynamic document generation and version management as tools to ensure reproducibility, we demonstrate with practical examples that these alone are not sufficient to ensure long-term computational reproducibility. Combining containerization, dependence management, version management, and dynamic document generation, the proposed workflow increases scientific productivity by facilitating later reproducibility and reuse of code and data

    A Reproducible Data Analysis Workflow with R Markdown, Git, Make, and Docker

    No full text
    In this tutorial, we describe a workflow to ensure long-term reproducibility of R-based data analyses. The workflow leverages established tools and practices from software engineering. It combines the benefits of various open-source software tools including R Markdown, Git, Make, and Docker, whose interplay ensures seamless integration of version management, dynamic report generation conforming to various journal styles and full cross-platform and long-term computational reproducibility. The workflow ensures meeting the primary goals that 1) the reporting of statistical results is consistent with the actual statistical results (dynamic report generation), the analysis exactly reproduces at a later time even if the computing platform or software is changed (computational reproducibility), and 3) changes at any time (during development and post-publication) are tracked, tagged, and documented while earlier versions of both data and code remain accessible. While the research community increasingly recognizes dynamic document generation and version management as tools to ensure reproducibility, we demonstrate with practical examples that these alone are not sufficient to ensure long-term computational reproducibility. Leveraging containerization, dependence management, version management, and literate programming, the workflow increases scientific productivity by facilitating later reproducibility and reuse of code and data

    Variable Selection in Structural Equation Models with Regularized MIMIC Models

    No full text
    Methodological innovations have allowed researchers to consider increasingly sophisticated statistical models that are better in line with the complexities of real world behavioural data. However, despite these powerful new analytic approaches, sample sizes may not always be sufficiently large to deal with the increase in model complexity. This poses a difficult modeling scenario that entails large models with a comparably limited number of observations given the number of parameters (also known as the “small n, large p” problem). We here describe a particular strategy to overcoming this challenge, called regularization. Regularization, a method to penalize model complexity during estimation, has proven a viable option for estimating parameters in this small n, large p settings, but has so far mostly been used in linear regression models. Here we show how to integrate regularization within structural equation models, a popular analytic approach in psychology. We first describe the rationale behind regularization in regression contexts, and how it can be extended to regularized structural equation modeling (Jacobucci, Grimm, & McArdle, 2016). Our approach is evaluated through the use of a simulation study, showing that regularized SEM outperforms traditional SEM estimation methods in situations with a large number of predictors, or when sample size is small. We illustrate the power of this approach in a N=627 example from the CAM-CAN study, modeling the neural determinants of visual short term memory. We illustrate the performance of the method and discuss practical aspects of modeling empirical data, and provide a step-by-step online tutorial

    Assessing rigor and impact of research software for hiring and promotion in psychology: A comment on Gärtner et al. (2022)

    No full text
    Based on four principles of a more responsible research assessment in academic hiring and promotion processes, Gärtner, Leising, and Schönbrodt (2022) suggested an evaluation scheme for published manuscripts, reusable data sets, and research software. This commentary responds to the proposed indicators for the evaluation of research software contributions in academic hiring and promotion processes. Acknowledging the significance of research software as a critical component of modern science, we propose that an evaluation scheme must emphasize the two major dimensions of rigor and impact. Generally, we believe that research software should be recognized as valuable scientific output in academic hiring and promotion, with the hope that this incentivizes the development of more open and better research software
    corecore