2,442 research outputs found
A Visual Analytics Approach to Comparing Cohorts of Event Sequences
Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon.
To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event).
Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome.
Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data.
Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security.
My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences
The role of the individual in the coming era of process-based therapy
For decades the development of evidence-based therapy has been based on experimental tests of protocols designed to impact psychiatric syndromes. As this paradigm weakens, a more process-based therapy approach is rising in its place, focused on how to best target and change core biopsychosocial processes in specific situations for given goals with given clients. This is an inherently more idiographic question than has normally been at issue in evidence-based therapy over the last few decades. In this article we explore methods of assessment and analysis that can integrate idiographic and nomothetic approaches in a process-based era.Accepted manuscrip
The relationships among task complexity, content sequence, and instructional effectiveness in procedural learning.
Two questions were investigated: (1) Is the timing of the opportunity for the learner to integrate procedural content on the application level related to performance on tasks of high complexity? (2) Is the timing of the opportunity for the learner to integrate procedural content on the application level related to performance on tasks of low complexity?The content used in both treatment conditions was procedures involved in checking accounts. Following a task analysis, the content was sequenced according to the two treatment conditions. Two teachers delivered both sets of instruction once. Following completion of five one-hour training sessions, test instruments were administered to assess performance on tasks of low complexity and high complexity.A two (Teacher 1, Teacher 2) by two (OCI Sequence, TCI Sequence) factorial analysis of variance (ANOVA) was used to analyze performance measures. For both simple and complex tasks, the ANOVA showed no significant difference that could be attributed to content sequence.The rationale for this study was based on the concept of assimilation-to-schema. This theory predicts that learning effectiveness will be increased by providing a complete but general version of the content prior to providing the specific of the content. Application of this learning theory can result in a general-to-detailed content sequence. This sequence can be contrasted to a parts-to-whole sequence which provides a complete version of the content following presentation of all parts of the content. A general-to-detailed sequence can be said to provide ongoing content integration while a parts-to-whole sequence can be said to provide terminal content integration.This study was designed to investigate relationships between content sequence as it contributes to content integration and procedural learning. Given that content sequence is fundamental to any intentioned learning situation, the relationship between organization and eventual integration of the content is of primary concern. Nowhere is the concern more evident than in consideration of procedural learning where the failure to integrate a single step into an overall procedure can result in an inability to correctly or completely apply a procedure or set of procedures.The subjects for this study (N = 103) were from a population of middle school students. One treatment condition was instruction on content sequenced to provide ongoing content integration on the application level (OCI Sequence). The other treatment condition was instruction on content sequenced to provide content integration upon completion or termination of instruction (TCI Sequence)
Recommended from our members
Understanding User Behaviour through Action Sequences: from the Usual to the Unusual
Action sequences, where atomic user actions are represented in a labelled, timestamped form, are becoming a fundamental data asset in the inspection and monitoring of user behaviour in digital systems. Although the analysis of such sequences is highly critical to the investigation of activities in cyber security applications, existing solutions fail to provide a comprehensive understanding due to the complex semantic and temporal characteristics of these data. This paper presents a visual analytics approach that aims to facilitate a user-involved, multi-faceted decision making process during the identification and the investigation of “unusual” action sequences. We first report the results of the task analysis and domain characterisation process. Then we describe the components of our multi-level analysis approach that comprises of constraint-based sequential pattern mining and semantic distance based clustering, and multi-scalar visualisations of users and their sequences. Finally, we demonstrate the applicability of our approach through a case study that involves tasks requiring effective decision-making by a group of domain experts. Although our solution here is tightly informed by a user-centred, domain-focused design process, we present findings and techniques that are transferable to other applications where the analysis of such sequences is of interest
Doctor of Philosophy
dissertationA broad range of applications capture dynamic data at an unprecedented scale. Independent of the application area, finding intuitive ways to understand the dynamic aspects of these increasingly large data sets remains an interesting and, to some extent, unsolved research problem. Generically, dynamic data sets can be described by some, often hierarchical, notion of feature of interest that exists at each moment in time, and those features evolve across time. Consequently, exploring the evolution of these features is considered to be one natural way of studying these data sets. Usually, this process entails the ability to: 1) define and extract features from each time step in the data set; 2) find their correspondences over time; and 3) analyze their evolution across time. However, due to the large data sizes, visualizing the evolution of features in a comprehensible manner and performing interactive changes are challenging. Furthermore, feature evolution details are often unmanageably large and complex, making it difficult to identify the temporal trends in the underlying data. Additionally, many existing approaches develop these components in a specialized and standalone manner, thus failing to address the general task of understanding feature evolution across time. This dissertation demonstrates that interactive exploration of feature evolution can be achieved in a non-domain-specific manner so that it can be applied across a wide variety of application domains. In particular, a novel generic visualization and analysis environment that couples a multiresolution unified spatiotemporal representation of features with progressive layout and visualization strategies for studying the feature evolution across time is introduced. This flexible framework enables on-the-fly changes to feature definitions, their correspondences, and other arbitrary attributes while providing an interactive view of the resulting feature evolution details. Furthermore, to reduce the visual complexity within the feature evolution details, several subselection-based and localized, per-feature parameter value-based strategies are also enabled. The utility and generality of this framework is demonstrated by using several large-scale dynamic data sets
Disentangling the representativeness heuristic from the availability heuristic
Tese de doutoramento, Psicologia (Cognição Social), Universidade de Lisboa, Faculdade de Psicologia, 2015Most judgments, predictions and decisions rely on simplifying reasoning
heuristics, such as representativeness and availability heuristics. Representativeness
heuristic relies on a judgment of similarity between a categorical prototype and a target.
Availability heuristic relies on the accessibility of instances. A crucial assumption of
Heuristics and Biases research program (Tversky & Kahneman, 1974) was that
systematic and characteristic biases were unmistakably associated with each heuristic.
Unfortunately, often the same biases can be explained by different heuristics (e.g.,
Anderson, 1990; Gigerenzer, 1991). This problem is particularly striking in the case of
availability and representativeness. The main goal of this dissertation is to conceptually
clarify and empirically disentangle these heuristics, thus defining conditions for the use
of one or the other. This dissertation explores three variables that have the potential to
determine when people will use representativeness or availability: the level of construal,
the computational speed of the heuristics, and directional motivation.
The first empirical chapter (Chapter II) explores whether the representativeness
heuristic relies on more abstract information than the availability heuristic, and uses the
construal level theory (e.g., Trope & Liberman, 2000) as a framework to explore and
manipulate different levels of abstraction. Chapters III and IV explore whether
representativeness heuristic takes longer to compute using a paradigm about predictions
of binary random events, where both heuristics can be applied in the same judgment.
The last empirical chapter (Chapter V) explores the role of directional motivation on the
heuristic processes. The motivation to observe a certain outcome should affect people’s
representation of a target event, and consequently lead to self-serving predictions. The
role of directional motivation is thus discussed as a variable that could be used in order
to determine the use representativeness or availability heuristic. The consequences of
the proposed differences between representativeness and availability, for psychological
models of judgment and decision making are discussed.Muitos julgamentos, previsões e decisões são tomadas com base em heurísticas
de julgamento como as heurísticas da representatividade e da disponibilidade. A
heurística da representatividade baseia-se num julgamento de semelhança entre um
protótipo e o alvo. A heurística da disponibilidade baseia-se na acessibilidade de
ocorrências específicas. Um ponto essencial do programa Heuristicas e Enviesamentos
(Tversky & Kahneman, 1974) seria que enviesamentos específicos e sistemáticos
estariam inequivocamente associados a diferentes heurísticas. Infelizmente, muitas
vezes o mesmo enviesamento poderia ser explicado por diferentes heurísticas. Este
problema é particularmente grave no caso da representatividade e da disponibilidade. O
objectivo desta tese é clarificar e dissociar empiricamente estas heurísticas, definindo,
assim, condições para o uso de uma ou da outra. Esta tese explora três variáveis que
poderão ajudar a determinar quando usamos a heurística da representatividade ou da
disponibilidade: nível de abstracção; velocidade computacional e motivação direccional.
O primeiro capítulo empírico (Capítulo II) explora se a heurística da
representatividade depende de informação mais abstracta que a heurística da
disponibilidade, partindo da “construal level theory” (e.g., Trope & Liberman, 2000)
para explorar e manipular níveis de abstracção. Os Capítulos III e IV exploram se a
heurística da representatividade demora mais tempo a ser computada que a heurística da
disponibilidade quando ambas as heurísticas podem ser aplicadas a uma tarefa de
previsão binária de eventos aleatórios. O Capítulo V explora o papel da motivação para
observar um resultado nos processos heurísticos. O desejo de observar determinado
resultado deverá afectar a representação dos eventos e levar a prever o resultado que se
deseja observar. O potencial papel da motivação direccional na determinação do uso das
heurísticas da representatividade ou da disponibilidade é discutido.Fundação para a Ciência e a Tecnologia (FCT), SFRH/BD/73378/201
Recommended from our members
A survey on online monitoring approaches of computer-based systems
This report surveys forms of online data collection that are in current use (as well as being the subject of research to adapt them to changing technology and demands), and can be used as inputs to assessment of dependability and resilience, although they are not primarily meant for this use
Learning Sentence-internal Temporal Relations
In this paper we propose a data intensive approach for inferring
sentence-internal temporal relations. Temporal inference is relevant for
practical NLP applications which either extract or synthesize temporal
information (e.g., summarisation, question answering). Our method bypasses the
need for manual coding by exploiting the presence of markers like after", which
overtly signal a temporal relation. We first show that models trained on main
and subordinate clauses connected with a temporal marker achieve good
performance on a pseudo-disambiguation task simulating temporal inference
(during testing the temporal marker is treated as unseen and the models must
select the right marker from a set of possible candidates). Secondly, we assess
whether the proposed approach holds promise for the semi-automatic creation of
temporal annotations. Specifically, we use a model trained on noisy and
approximate data (i.e., main and subordinate clauses) to predict
intra-sentential relations present in TimeBank, a corpus annotated rich
temporal information. Our experiments compare and contrast several
probabilistic models differing in their feature space, linguistic assumptions
and data requirements. We evaluate performance against gold standard corpora
and also against human subjects
Dynamic timelines : visualizing historical information in three dimensions
Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 48-50).by Robin Lee Kullberg.M.S
- …