6 research outputs found

    Workflow analysis of data science code in public GitHub repositories

    Full text link
    Despite the ubiquity of data science, we are far from rigorously understanding how coding in data science is performed. Even though the scientific literature has hinted at the iterative and explorative nature of data science coding, we need further empirical evidence to understand this practice and its workflows in detail. Such understanding is critical to recognise the needs of data scientists and, for instance, inform tooling support. To obtain a deeper understanding of the iterative and explorative nature of data science coding, we analysed 470 Jupyter notebooks publicly available in GitHub repositories. We focused on the extent to which data scientists transition between different types of data science activities, or steps (such as data preprocessing and modelling), as well as the frequency and co-occurrence of such transitions. For our analysis, we developed a dataset with the help of five data science experts, who manually annotated the data science steps for each code cell within the aforementioned 470 notebooks. Using the first-order Markov chain model, we extracted the transitions and analysed the transition probabilities between the different steps. In addition to providing deeper insights into the implementation practices of data science coding, our results provide evidence that the steps in a data science workflow are indeed iterative and reveal specific patterns. We also evaluated the use of the annotated dataset to train machine-learning classifiers to predict the data science step(s) of a given code cell. We investigate the representativeness of the classification by comparing the workflow analysis applied to (a) the predicted data set and (b) the data set labelled by experts, finding an F1-score of about 71% for the 10-class data science step prediction problem

    Augmenting Autobiographical Memory: An Approach Based on Cognitive Psychology

    Get PDF
    This thesis investigates how an interactive software system can support a person in remembering their past experiences and information related to these experiences. It proposes design recommendations for augmented autobiographical memory systems derived from Cognitive Psychology research into human memory – a perspective missing from prior work. Based on these recommendations, a conceptual design of an augmented autobiographical memory system is developed that aims to support users in retrieving cues and factual information related to experiences as well as in reconstructing those experiences. The retrieval aspects of this design are operationalised in an interactive software system called the Digital Parrot. Three important factors in the design and implementation are the context of an experience, semantic information about items in the system and associations between items. Two user studies evaluated the design and implementation of the Digital Parrot. The first study focused on the system's usability. It showed that the participants could use the Digital Parrot to accurately answer questions about an example memory data set and revealed a number of usability issues in the Digital Parrot's user interface. The second study embodied a novel approach to evaluating systems of this type and tested how an improved version of the Digital Parrot supported the participants in remembering experiences after an extended time period of two years. The study found that the Digital Parrot allowed the participants to answer questions about their own past experiences more completely and more correctly than unaided memory and that it allowed them to answer questions for which the participants' established strategies to counteract memory failures were likely to be unsuccessful. In the studies, associations between items were the most helpful factor for accessing memory-related information. The inclusion of semantic information was found to be promising especially in combination with textual search. Context was used to access information by the participants in both studies less often than expected, which suggests the need for further research. Identifying how to appropriately augment autobiographical memory is an important goal given the increasing volume of information to which users are exposed. This thesis contributes to achievement of this goal by stating the problem in Cognitive Psychology terms and by making design recommendations for augmented autobiographical memory systems. The recommendations are confirmed by the design and implementation of such a system and by empirical evaluations using an evaluation method appropriate for the field

    Digital life stories: Semi-automatic (auto)biographies within lifelog collections

    Get PDF
    Our life stories enable us to reflect upon and share our personal histories. Through emerging digital technologies the possibility of collecting life experiences digitally is increasingly feasible; consequently so is the potential to create a digital counterpart to our personal narratives. In this work, lifelogging tools are used to collect digital artifacts continuously and passively throughout our day. These include images, documents, emails and webpages accessed; texts messages and mobile activity. This range of data when brought together is known as a lifelog. Given the complexity, volume and multimodal nature of such collections, it is clear that there are significant challenges to be addressed in order to achieve coherent and meaningful digital narratives of our events from our life histories. This work investigates the construction of personal digital narratives from lifelog collections. It examines the underlying questions, issues and challenges relating to construction of personal digital narratives from lifelogs. Fundamentally, it addresses how to organize and transform data sampled from an individual’s day-to-day activities into a coherent narrative account. This enquiry is enabled by three 20-month long-term lifelogs collected by participants and produces a narrative system which enables the semi-automatic construction of digital stories from lifelog content. Inspired by probative studies conducted into current practices of curation, from which a set of fundamental requirements are established, this solution employs a 2-dimensional spatial framework for storytelling. It delivers integrated support for the structuring of lifelog content and its distillation into storyform through information retrieval approaches. We describe and contribute flexible algorithmic approaches to achieve both. Finally, this research inquiry yields qualitative and quantitative insights into such digital narratives and their generation, composition and construction. The opportunities for such personal narrative accounts to enable recollection, reminiscence and reflection with the collection owners are established and its benefit in sharing past personal experience experiences is outlined. Finally, in a novel investigation with motivated third parties we demonstrate the opportunities such narrative accounts may have beyond the scope of the collection owner in: personal, societal and cultural explorations, artistic endeavours and as a generational heirloom

    Re-finding Tweets - Analyse der Personal-Information-Management-Praktik Re-finding im Kontext der Social-Media-Plattform Twitter

    Get PDF
    Diese Arbeit untersucht das Informationsverhalten von Social-Media-Anwendern aus der Perspektive des Personal Information Management und fokussiert dabei auf Re-finding-Verhalten, also das Wiederfinden von bereits wahrgenommener Information. Als Untersuchungsgegenstand dient die Social-Media-Plattform Twitter. Ziel der Arbeit ist die Beobachtung, Dokumentation, Beschreibung und Interpretation des Nutzerverhaltens beim Wiederfinden von Tweets und die Erarbeitung von DesignvorschlĂ€gen, um Twitter-Nutzer bei diesem InformationsbedĂŒrfnis zu unterstĂŒtzen. Als Forschungsstrategie dient ein Sequential-Mixed-Methods-Design, welches die sukzessive Erhebung und Auswertung von qualitativen bzw. subjektiven und quantitativen bzw. objektiven Daten in Form von zwei großen Studien --- einer Umfrage und einer Logstudie --- ermöglicht und es schließlich erlaubt, durch Kombination und Diskussion der Einzelergebnisse ein holistisches Bild von Wiederfindensverhalten auf Twitter zu zeichnen. Die Arbeit zeigt, dass Nutzer sehr hĂ€ufig das BedĂŒrfnis haben, zu bereits gesehenen Tweets zurĂŒckzukehren. Twitter, obwohl es einen Fokus auf Echtzeitinformationen legt, besitzt Archivcharakter, da hĂ€ufig auch Ă€ltere Nachrichten wieder aufgerufen werden und persönliche Tweets einen lĂ€ngeren Lebenszyklus besitzen, als man dies von ihnen erwarten wĂŒrde. Wiederfindensstrategien --- besonders Orienteering-Verhalten --- die bereits in anderen Personal-Information-Management-Kontexten wie mit E-Mails oder bei der Nutzung von Dateimanagern identifiziert werden konnten, treten auch beim Wiederfinden von Tweets auf. Wiederfinden kann eine komplexe Aufgabe sein, die Nutzer frustriert zurĂŒcklĂ€sst. DarĂŒber hinaus haben Nutzer Schwierigkeiten bei der EinschĂ€tzung, ob Tweets in Zukunft von Relevanz sein könnten. Angemessen trainierte Algorithmen können Nutzer beim Wiederfinden von Tweets unterstĂŒtzen

    Integrating human communication strategies with project management for effective outcomes

    Get PDF
    Project managers' email in-boxes often contain hundreds of emails in which project related conversations are captured. The conversations are written records of team members' feedback regarding activities and their experiences performing these activities. They may also contain problems, expectations, emotions and lexical patterns (PEEL). Identifying these elements of project communication from email text and using them for the purpose of project management is a complex process. From the review of the existing literature of email analysis and project communication we identied four signicant shortcomings made up of: (i) lack of communication features, (ii) limited communication metrics, (iii) no link of email analysis to project monitoring, and (iv) limited understanding of how knowledge from email analysis can help improve functioning of a project. The study was set out to address the four shortcomings with the aim of addressing the need for a methodology that integrates knowledge from incoming email communication into project management practices. The research found that measurable characteristics of incoming communication through observations of both factual (technical) and personal (human) factors can generate signicant insight into indicators for the state of project health which in turn can be used to draw the project manager's attention to areas that worked well and areas that need consideration. In this study we developed a better understanding of various factors of incoming communi- cation in projects by in-depth analysis of email communication from ve projects with over a thousand emails. This included identication of multiple features embedded in emails, as well as coding and analysis of feature values for the purpose of identifying various measurable character- istics of incoming communication. This enabled implementation of communication metrics where \communication metrics" were linked to project \critical success factors". We demonstrate that by linking of two areas of research focus is on the observations of actors and their activities and experiences performing these activities. We were able to identify measurable characteristics of communication which could be used to provide signicant insights into indicators for the state of project health. We used this approach to generate communication reports which assisted the managers in identifying areas that worked or were critical to the project progress. Our theoretical contribution relates to the \Email Feedback Analysis" (EFA) model used for processing of project email communication in order to identify important elements of project activity useful for project managers; the insights into the e ectiveness of communication within a project as well as a metric for comparing communications across projects. Our model focuses on two types of information: information about team members (actors) activities and experiences while performing those activities in the context of communication and the same information in the context of project tasks. Our practical contributions relate to a framework and a vocabulary for the analysis of incoming communication, instructions of \how to code" incoming communication records in projects such as emails sent to project managers, \ProCommFeedback" software that can be used to simplify and expedite the process of communication analysis, and communication reports. This research aims to make a signicant contribution to conceptual understanding of the role that incoming communication plays in the context of project management as well as practical implementation of linking knowledge from incoming email communication with project success for the purpose of project management. Our approach has the potential to be highly benecial for large projects with many teams and resources (locally or globally dispersed) where project managers do not have su cient day-to-day contact with all their staff members to gauge their problems, feelings and emotions which are a strong indicator of sound project progress.Doctor of Philosoph
    corecore