2,052 research outputs found

    A Data Mining Toolbox for Collaborative Writing Processes

    Get PDF
    Collaborative writing (CW) is an essential skill in academia and industry. Providing support during the process of CW can be useful not only for achieving better quality documents, but also for improving the CW skills of the writers. In order to properly support collaborative writing, it is essential to understand how ideas and concepts are developed during the writing process, which consists of a series of steps of writing activities. These steps can be considered as sequence patterns comprising both time events and the semantics of the changes made during those steps. Two techniques can be combined to examine those patterns: process mining, which focuses on extracting process-related knowledge from event logs recorded by an information system; and semantic analysis, which focuses on extracting knowledge about what the student wrote or edited. This thesis contributes (i) techniques to automatically extract process models of collaborative writing processes and (ii) visualisations to describe aspects of collaborative writing. These two techniques form a data mining toolbox for collaborative writing by using process mining, probabilistic graphical models, and text mining. First, I created a framework, WriteProc, for investigating collaborative writing processes, integrated with the existing cloud computing writing tools in Google Docs. Secondly, I created new heuristic to extract the semantic nature of text edits that occur in the document revisions and automatically identify the corresponding writing activities. Thirdly, based on sequences of writing activities, I propose methods to discover the writing process models and transitional state diagrams using a process mining algorithm, Heuristics Miner, and Hidden Markov Models, respectively. Finally, I designed three types of visualisations and made contributions to their underlying techniques for analysing writing processes. All components of the toolbox are validated against annotated writing activities of real documents and a synthetic dataset. I also illustrate how the automatically discovered process models and visualisations are used in the process analysis with real documents written by groups of graduate students. I discuss how the analyses can be used to gain further insight into how students work and create their collaborative documents

    mARC: Memory by Association and Reinforcement of Contexts

    Full text link
    This paper introduces the memory by Association and Reinforcement of Contexts (mARC). mARC is a novel data modeling technology rooted in the second quantization formulation of quantum mechanics. It is an all-purpose incremental and unsupervised data storage and retrieval system which can be applied to all types of signal or data, structured or unstructured, textual or not. mARC can be applied to a wide range of information clas-sification and retrieval problems like e-Discovery or contextual navigation. It can also for-mulated in the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast to Conway approach, the objects evolve in a massively multidimensional space. In order to start evaluating the potential of mARC we have built a mARC-based Internet search en-gine demonstrator with contextual functionality. We compare the behavior of the mARC demonstrator with Google search both in terms of performance and relevance. In the study we find that the mARC search engine demonstrator outperforms Google search by an order of magnitude in response time while providing more relevant results for some classes of queries

    A Wikipedia Literature Review

    Full text link
    This paper was originally designed as a literature review for a doctoral dissertation focusing on Wikipedia. This exposition gives the structure of Wikipedia and the latest trends in Wikipedia research

    Dialogue as Data in Learning Analytics for Productive Educational Dialogue

    Get PDF
    This paper provides a novel, conceptually driven stance on the state of the contemporary analytic challenges faced in the treatment of dialogue as a form of data across on- and offline sites of learning. In prior research, preliminary steps have been taken to detect occurrences of such dialogue using automated analysis techniques. Such advances have the potential to foster effective dialogue using learning analytic techniques that scaffold, give feedback on, and provide pedagogic contexts promoting such dialogue. However, the translation of much prior learning science research to online contexts is complex, requiring the operationalization of constructs theorized in different contexts (often face-to-face), and based on different datasets and structures (often spoken dialogue). In this paper, we explore what could constitute the effective analysis of productive online dialogues, arguing that it requires consideration of three key facets of the dialogue: features indicative of productive dialogue; the unit of segmentation; and the interplay of features and segmentation with the temporal underpinning of learning contexts. The paper thus foregrounds key considerations regarding the analysis of dialogue data in emerging learning analytics environments, both for learning-science and for computationally oriented researchers

    Crowdsourcing for Engineering Design: Objective Evaluations and Subjective Preferences

    Full text link
    Crowdsourcing enables designers to reach out to large numbers of people who may not have been previously considered when designing a new product, listen to their input by aggregating their preferences and evaluations over potential designs, aiming to improve ``good'' and catch ``bad'' design decisions during the early-stage design process. This approach puts human designers--be they industrial designers, engineers, marketers, or executives--at the forefront, with computational crowdsourcing systems on the backend to aggregate subjective preferences (e.g., which next-generation Brand A design best competes stylistically with next-generation Brand B designs?) or objective evaluations (e.g., which military vehicle design has the best situational awareness?). These crowdsourcing aggregation systems are built using probabilistic approaches that account for the irrationality of human behavior (i.e., violations of reflexivity, symmetry, and transitivity), approximated by modern machine learning algorithms and optimization techniques as necessitated by the scale of data (millions of data points, hundreds of thousands of dimensions). This dissertation presents research findings suggesting the unsuitability of current off-the-shelf crowdsourcing aggregation algorithms for real engineering design tasks due to the sparsity of expertise in the crowd, and methods that mitigate this limitation by incorporating appropriate information for expertise prediction. Next, we introduce and interpret a number of new probabilistic models for crowdsourced design to provide large-scale preference prediction and full design space generation, building on statistical and machine learning techniques such as sampling methods, variational inference, and deep representation learning. Finally, we show how these models and algorithms can advance crowdsourcing systems by abstracting away the underlying appropriate yet unwieldy mathematics, to easier-to-use visual interfaces practical for engineering design companies and governmental agencies engaged in complex engineering systems design.PhDDesign ScienceUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133438/1/aburnap_1.pd
    • …
    corecore