4 research outputs found
Recommended from our members
Evaluating the effects of missing values and mixed data types on social sequence clustering using t-SNE visualization
The goal of this work is to investigate the impact of missing values in clustering joint categorical social sequences. Identifying patterns in sociodemographic longitudinal data is important in a number of social science settings. However, performing analytical operations, such as clustering on life course trajectories, is challenging due to the categorical and multidimensional nature of the data, their mixed data types, and corruption by missing and inconsistent values. Data quality issues were investigated previously on single variable sequences. To understand their effects on multivariate sequence analysis, we employ a dataset of mixed data types and missing values, a dissimilarity measure designed for joint categorical sequence data, together with dimensionality reduction methodologies in a systematic design of sequence clustering experiments. Given the categorical nature of our data, we employ an “edit” distance using optimal matching. Because each data record has multiple variables of different types, we investigate the impact of mixing these variables in a single dissimilarity measure. Between variables with binary values and those with multiple nominal values, we find that the ability to overcome missing data problems is more difficult in the nominal domain than in the binary domain. Additionally, alignment of leading missing values can result in systematic biases in dissimilarity matrices and subsequently introduce both artificial clusters and unrealistic interpretations of associated data domains. We demonstrate the usage of t-distributed stochastic neighborhood embedding to visually guide mitigation of such biases by tuning the missing value substitution cost parameter or determining an optimal sequence span
Sequence analysis: its past, present, and future
This article marks the occasion of Social Science Research’s 50th anniversary by reflecting on the
progress of sequence analysis (SA) since its introduction into the social sciences four decades ago,
with focuses on the developments of SA thus far in the social sciences and on its potential future
directions.
The application of SA in the social sciences, especially in life course research, has mushroomed
in the last decade and a half. Using a life course analogy, we examined the birth of SA in the social
sciences and its childhood (the first wave), its adolescence and young adulthood (the second
wave), and its future mature adulthood in the paper.
The paper provides a summary of (1) the important SA research and the historical contexts in
which SA was developed by Andrew Abbott, (2) a thorough review of the many methodological
developments in visualization, complexity measures, dissimilarity measures, group analysis of
dissimilarities, cluster analysis of dissimilarities, multidomain/multichannel SA, dyadic/polyadic
SA, Markov chain SA, sequence life course analysis, sequence network analysis, SA in other social
science research, and software for SA, and (3) reflections on some future directions of SA
including how SA can benefit and inform theory-making in the social sciences, the methods
currently being developed, and some remaining challenges facing SA for which we do not yet
have any solutions. It is our hope that the reader will take up the challenges and help us improve
and grow SA into maturity
Sequence analysis: Its past, present, and future
This article marks the occasion of Social Science Research's 50th anniversary by reflecting on the progress of sequence analysis (SA) since its introduction into the social sciences four decades ago, with focuses on the developments of SA thus far in the social sciences and on its potential future directions. The application of SA in the social sciences, especially in life course research, has mushroomed in the last decade and a half. Using a life course analogy, we examined the birth of SA in the social sciences and its childhood (the first wave), its adolescence and young adulthood (the second wave), and its future mature adulthood in the paper. The paper provides a summary of (1) the important SA research and the historical contexts in which SA was developed by Andrew Abbott, (2) a thorough review of the many methodological developments in visualization, complexity measures, dissimilarity measures, group analysis of dissimilarities, cluster analysis of dissimilarities, multidomain/multichannel SA, dyadic/polyadic SA, Markov chain SA, sequence life course analysis, sequence network analysis, SA in other social science research, and software for SA, and (3) reflections on some future directions of SA including how SA can benefit and inform theory-making in the social sciences, the methods currently being developed, and some remaining challenges facing SA for which we do not yet have any solutions. It is our hope that the reader will take up the challenges and help us improve and grow SA into maturity.</p