3 research outputs found

    Workflow analysis of data science code in public GitHub repositories

    Full text link
    Despite the ubiquity of data science, we are far from rigorously understanding how coding in data science is performed. Even though the scientific literature has hinted at the iterative and explorative nature of data science coding, we need further empirical evidence to understand this practice and its workflows in detail. Such understanding is critical to recognise the needs of data scientists and, for instance, inform tooling support. To obtain a deeper understanding of the iterative and explorative nature of data science coding, we analysed 470 Jupyter notebooks publicly available in GitHub repositories. We focused on the extent to which data scientists transition between different types of data science activities, or steps (such as data preprocessing and modelling), as well as the frequency and co-occurrence of such transitions. For our analysis, we developed a dataset with the help of five data science experts, who manually annotated the data science steps for each code cell within the aforementioned 470 notebooks. Using the first-order Markov chain model, we extracted the transitions and analysed the transition probabilities between the different steps. In addition to providing deeper insights into the implementation practices of data science coding, our results provide evidence that the steps in a data science workflow are indeed iterative and reveal specific patterns. We also evaluated the use of the annotated dataset to train machine-learning classifiers to predict the data science step(s) of a given code cell. We investigate the representativeness of the classification by comparing the workflow analysis applied to (a) the predicted data set and (b) the data set labelled by experts, finding an F1-score of about 71% for the 10-class data science step prediction problem

    Software Support of an Order Lifecycle Management

    Get PDF
    Diplomová práce je zaměřena na problematiku analýzy a optimalizace procesu životního cyklu zakázky ve společnosti poskytující komplexní ICT služby. V první části práce jsou popsána teoretická východiska a přístupy použité při následné analýze a návrhu řešení. Druhá část práce zahrnuje analýzu procesu životního cyklu zakázky v kontextu ostatních podnikových procesů a organizačních a řídicích struktur odvíjejících se od podnikové strategie a popis stávajících ICT prostředků pro podporu procesu. Návrhová část práce předkládá možné způsoby optimalizace procesu a jeho softwarové podpory.This Master’s thesis deals with the analysis and optimization of the order lifecycle process at a company providing comprehensive ICT services. Theoretical basis for analyzing and optimizing business processes is described in the first part. The second part of the thesis is focused on the analysis of the order lifecycle process, in the context of other business processes and organizational and management structures deriving from the business strategy. A description of existing ICT resources to support the process is also included. The solution proposal of the work presents possible ways to optimize the process and its software support.

    An Approach to Workflow Modeling and Analysis

    No full text
    Abstract — In this paper we present a new approach to workflow analysis. There are efforts to design and verify workflow models using both Activity diagrams and Petri nets. We model the workflow using Activity diagrams, convert the Activity diagrams to Petri nets and use the theoretical results in Petri nets to analyze the equivalent Petri nets and infer properties of the original workflow. We have demonstrated the possibility by developing an Eclipse plug-in which can be used to model workflows using Activity Diagrams and then analyze these workflow models using Petri nets. Index Terms — workflow, activity diagrams, Petri nets, eclipse, workflow analysi
    corecore