2,442 research outputs found

    A Visual Analytics Approach to Comparing Cohorts of Event Sequences

    Get PDF
    Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences

    The role of the individual in the coming era of process-based therapy

    Get PDF
    For decades the development of evidence-based therapy has been based on experimental tests of protocols designed to impact psychiatric syndromes. As this paradigm weakens, a more process-based therapy approach is rising in its place, focused on how to best target and change core biopsychosocial processes in specific situations for given goals with given clients. This is an inherently more idiographic question than has normally been at issue in evidence-based therapy over the last few decades. In this article we explore methods of assessment and analysis that can integrate idiographic and nomothetic approaches in a process-based era.Accepted manuscrip

    The relationships among task complexity, content sequence, and instructional effectiveness in procedural learning.

    Get PDF
    Two questions were investigated: (1) Is the timing of the opportunity for the learner to integrate procedural content on the application level related to performance on tasks of high complexity? (2) Is the timing of the opportunity for the learner to integrate procedural content on the application level related to performance on tasks of low complexity?The content used in both treatment conditions was procedures involved in checking accounts. Following a task analysis, the content was sequenced according to the two treatment conditions. Two teachers delivered both sets of instruction once. Following completion of five one-hour training sessions, test instruments were administered to assess performance on tasks of low complexity and high complexity.A two (Teacher 1, Teacher 2) by two (OCI Sequence, TCI Sequence) factorial analysis of variance (ANOVA) was used to analyze performance measures. For both simple and complex tasks, the ANOVA showed no significant difference that could be attributed to content sequence.The rationale for this study was based on the concept of assimilation-to-schema. This theory predicts that learning effectiveness will be increased by providing a complete but general version of the content prior to providing the specific of the content. Application of this learning theory can result in a general-to-detailed content sequence. This sequence can be contrasted to a parts-to-whole sequence which provides a complete version of the content following presentation of all parts of the content. A general-to-detailed sequence can be said to provide ongoing content integration while a parts-to-whole sequence can be said to provide terminal content integration.This study was designed to investigate relationships between content sequence as it contributes to content integration and procedural learning. Given that content sequence is fundamental to any intentioned learning situation, the relationship between organization and eventual integration of the content is of primary concern. Nowhere is the concern more evident than in consideration of procedural learning where the failure to integrate a single step into an overall procedure can result in an inability to correctly or completely apply a procedure or set of procedures.The subjects for this study (N = 103) were from a population of middle school students. One treatment condition was instruction on content sequenced to provide ongoing content integration on the application level (OCI Sequence). The other treatment condition was instruction on content sequenced to provide content integration upon completion or termination of instruction (TCI Sequence)

    Doctor of Philosophy

    Get PDF
    dissertationA broad range of applications capture dynamic data at an unprecedented scale. Independent of the application area, finding intuitive ways to understand the dynamic aspects of these increasingly large data sets remains an interesting and, to some extent, unsolved research problem. Generically, dynamic data sets can be described by some, often hierarchical, notion of feature of interest that exists at each moment in time, and those features evolve across time. Consequently, exploring the evolution of these features is considered to be one natural way of studying these data sets. Usually, this process entails the ability to: 1) define and extract features from each time step in the data set; 2) find their correspondences over time; and 3) analyze their evolution across time. However, due to the large data sizes, visualizing the evolution of features in a comprehensible manner and performing interactive changes are challenging. Furthermore, feature evolution details are often unmanageably large and complex, making it difficult to identify the temporal trends in the underlying data. Additionally, many existing approaches develop these components in a specialized and standalone manner, thus failing to address the general task of understanding feature evolution across time. This dissertation demonstrates that interactive exploration of feature evolution can be achieved in a non-domain-specific manner so that it can be applied across a wide variety of application domains. In particular, a novel generic visualization and analysis environment that couples a multiresolution unified spatiotemporal representation of features with progressive layout and visualization strategies for studying the feature evolution across time is introduced. This flexible framework enables on-the-fly changes to feature definitions, their correspondences, and other arbitrary attributes while providing an interactive view of the resulting feature evolution details. Furthermore, to reduce the visual complexity within the feature evolution details, several subselection-based and localized, per-feature parameter value-based strategies are also enabled. The utility and generality of this framework is demonstrated by using several large-scale dynamic data sets

    Disentangling the representativeness heuristic from the availability heuristic

    Get PDF
    Tese de doutoramento, Psicologia (Cognição Social), Universidade de Lisboa, Faculdade de Psicologia, 2015Most judgments, predictions and decisions rely on simplifying reasoning heuristics, such as representativeness and availability heuristics. Representativeness heuristic relies on a judgment of similarity between a categorical prototype and a target. Availability heuristic relies on the accessibility of instances. A crucial assumption of Heuristics and Biases research program (Tversky & Kahneman, 1974) was that systematic and characteristic biases were unmistakably associated with each heuristic. Unfortunately, often the same biases can be explained by different heuristics (e.g., Anderson, 1990; Gigerenzer, 1991). This problem is particularly striking in the case of availability and representativeness. The main goal of this dissertation is to conceptually clarify and empirically disentangle these heuristics, thus defining conditions for the use of one or the other. This dissertation explores three variables that have the potential to determine when people will use representativeness or availability: the level of construal, the computational speed of the heuristics, and directional motivation. The first empirical chapter (Chapter II) explores whether the representativeness heuristic relies on more abstract information than the availability heuristic, and uses the construal level theory (e.g., Trope & Liberman, 2000) as a framework to explore and manipulate different levels of abstraction. Chapters III and IV explore whether representativeness heuristic takes longer to compute using a paradigm about predictions of binary random events, where both heuristics can be applied in the same judgment. The last empirical chapter (Chapter V) explores the role of directional motivation on the heuristic processes. The motivation to observe a certain outcome should affect people’s representation of a target event, and consequently lead to self-serving predictions. The role of directional motivation is thus discussed as a variable that could be used in order to determine the use representativeness or availability heuristic. The consequences of the proposed differences between representativeness and availability, for psychological models of judgment and decision making are discussed.Muitos julgamentos, previsões e decisões são tomadas com base em heurísticas de julgamento como as heurísticas da representatividade e da disponibilidade. A heurística da representatividade baseia-se num julgamento de semelhança entre um protótipo e o alvo. A heurística da disponibilidade baseia-se na acessibilidade de ocorrências específicas. Um ponto essencial do programa Heuristicas e Enviesamentos (Tversky & Kahneman, 1974) seria que enviesamentos específicos e sistemáticos estariam inequivocamente associados a diferentes heurísticas. Infelizmente, muitas vezes o mesmo enviesamento poderia ser explicado por diferentes heurísticas. Este problema é particularmente grave no caso da representatividade e da disponibilidade. O objectivo desta tese é clarificar e dissociar empiricamente estas heurísticas, definindo, assim, condições para o uso de uma ou da outra. Esta tese explora três variáveis que poderão ajudar a determinar quando usamos a heurística da representatividade ou da disponibilidade: nível de abstracção; velocidade computacional e motivação direccional. O primeiro capítulo empírico (Capítulo II) explora se a heurística da representatividade depende de informação mais abstracta que a heurística da disponibilidade, partindo da “construal level theory” (e.g., Trope & Liberman, 2000) para explorar e manipular níveis de abstracção. Os Capítulos III e IV exploram se a heurística da representatividade demora mais tempo a ser computada que a heurística da disponibilidade quando ambas as heurísticas podem ser aplicadas a uma tarefa de previsão binária de eventos aleatórios. O Capítulo V explora o papel da motivação para observar um resultado nos processos heurísticos. O desejo de observar determinado resultado deverá afectar a representação dos eventos e levar a prever o resultado que se deseja observar. O potencial papel da motivação direccional na determinação do uso das heurísticas da representatividade ou da disponibilidade é discutido.Fundação para a Ciência e a Tecnologia (FCT), SFRH/BD/73378/201

    Learning Sentence-internal Temporal Relations

    Get PDF
    In this paper we propose a data intensive approach for inferring sentence-internal temporal relations. Temporal inference is relevant for practical NLP applications which either extract or synthesize temporal information (e.g., summarisation, question answering). Our method bypasses the need for manual coding by exploiting the presence of markers like after", which overtly signal a temporal relation. We first show that models trained on main and subordinate clauses connected with a temporal marker achieve good performance on a pseudo-disambiguation task simulating temporal inference (during testing the temporal marker is treated as unseen and the models must select the right marker from a set of possible candidates). Secondly, we assess whether the proposed approach holds promise for the semi-automatic creation of temporal annotations. Specifically, we use a model trained on noisy and approximate data (i.e., main and subordinate clauses) to predict intra-sentential relations present in TimeBank, a corpus annotated rich temporal information. Our experiments compare and contrast several probabilistic models differing in their feature space, linguistic assumptions and data requirements. We evaluate performance against gold standard corpora and also against human subjects

    Dynamic timelines : visualizing historical information in three dimensions

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 48-50).by Robin Lee Kullberg.M.S
    corecore