12 research outputs found

    How to Design An Interactive System for Data Science: A Literature Review

    Get PDF
    As part of an ongoing design science research project, we present a systematic literature review and the classification of 214 papers scoping the work on Data Science (DS) in the fields of Information Systems and Human-Computer Interaction. The overall search was conducted on Web of Science, Science Direct and ACM Digital Library, for papers about the design of IT artefacts for Data Science, over the period of 1997 until 2017. Our work confirms a rich interdisciplinary field of inquiry and identifies promising research clusters, with examples. Moreover, we found few studies with concrete guidance on how to design a system for DS when targeting for broader technical and business user profiles and multi-domain application. Being a multidimensional and creative complex process, there is potential in the development of hybrid methods of design theory and practice, for a variety of further work from researchers and practitioners

    Helping Data Science Students Develop Task Modularity

    Get PDF
    This paper explores the skills needed to be a data scientist. Specifically, we report on a mixed method study of a project-based data science class, where we evaluated student effectiveness with respect to dividing a project into appropriately sized modular tasks, which we termed task modularity. Our results suggest that while data science students can appreciate the value of task modularity, they struggle to achieve effective task modularity. As a first step, based our study, we identified six task decomposition best practices. However, these best practices do not fully address this gap of how to enable data science students to effectively use task modularity. We note that while computer science/information system programs typically teach modularity (e.g., the decomposition process and abstraction), and there remains a need identify a corresponding model to that used for computer science / information system students, to teach modularity to data science students

    Comparing Data Science Project Management Methodologies via a Controlled Experiment

    Get PDF
    Data Science is an emerging field with a significant research focus on improving the techniques available to analyze data. However, there has been much less focus on how people should work together on a data science project. In this paper, we report on the results of an experiment comparing four different methodologies to manage and coordinate a data science project. We first introduce a model to compare different project management methodologies and then report on the results of our experiment. The results from our experiment demonstrate that there are significant differences based on the methodology used, with an Agile Kanban methodology being the most effective and surprisingly, an Agile Scrum methodology being the least effective

    MIDST: an enhanced development environment that improves the maintainability of a data science analysis

    Get PDF
    With the increasing ability to generate actionable insight from data, the field of data science has seen significant growth. As more teams develop data science solutions, the analytical code they develop will need to be enhanced in the future, by an existing or a new team member. Thus, the importance of being able to easily maintain and enhance the code required for an analysis will increase. However, to date, there has been minimal research on the maintainability of an analysis done by a data science team. To help address this gap, data science maintainability was explored by (1) creating a data science maintainability model, (2) creating a new tool, called MIDST (Modular Interactive Data Science Tool), that aims to improve data science maintainability, and then (3) conducting a mixed method experiment to evaluate MIDST. The new tool aims to improve the ability of a team member to update and rerun an existing data science analysis by providing a visual data flow view of the analysis within an integrated code and computational environment. Via an analysis of the quantitative and qualitative survey results, the experiment found that MIDST does help improve the maintainability of an analysis. Thus, this research demonstrates the importance of enhanced tools to help improve the maintainability of data science projects

    Inteligência de negócios ou ciência de dados? O que dados bibliográficos inicialmente nos dizem?

    Get PDF
    A constante evolução tecnológica tem permitido que se gerem dados com maior volume, variedade e velocidade. Esse contexto pode gerar dúvidas sobre se continuará o modelo tradicional de business intelligence o qual se adequará à essa nova realidade, ou se se imporá a chamada datascience e suas novas e complementares formas de análise de dados. De forma a lançar luz sobre a relação entre essas duas áreas, no âmbito acadêmico, objetivou-se identificar se há relação na produção acadêmica da área de business intelligence e data science, considerando características de artigos das respectivas áreas. Analisou-se, com base em uma pesquisa bibliográfica que considerou artigos de periódicos publicados na base de dados Scopus, se haveria alguma forma de sobreposição das duas áreas, considerando  as seguintes características bibliográficas: autores, palavras-chave e citações. Com isso foi possível constatar uma tênue sobreposição entre os trabalhos das áreas estudadas e apontar questões para estudos futuros
    corecore