7 research outputs found
Enhanced Multiscale Sampling of the Cel7A-Cellulose Interaction
A Modelling and Simulation Study to Understand the Enzymatic Conversion of Waste Cellulose into Biofeul
Introducing DASC-PM: A Data Science Process Model
Data-driven disciplines like data mining and knowledge management already provide process-based frameworks for data analysis projects, such as the well-known cross-industry standard process for data mining (CRISP-DM) or knowledge discovery in databases (KDD). Although the domain of data science addresses a much broader problem space, i.e., also considers economic, social, and ecological impacts of data-driven projects, a corresponding domain-specific process model is still missing. Consequently, based on a total of four identified meta requirements and 17 corresponding requirements that were collected from experts of theory and practice, this contribution proposes the empirically grounded data science process model (DASC-PM)—a framework that maps a data science project as a four-step process model and contextualizes it among scientific procedures, various areas of application, IT infrastructures, and impacts. To illustrate the phase-oriented specification capabilities of the DASCPM, we exemplarily present competence and role profiles for the analysis phase of a data science project
DASC-PM v1.0 : ein Vorgehensmodell fĂĽr Data-Science-Projekte
Das Thema Data Science hat in den letzten Jahren in vielen Organisationen stark an Aufmerksamkeit gewonnen. Häufig herrscht jedoch weiterhin große Unklarheit darüber, wie diese Disziplin von anderen abzugrenzen ist, welche Besonderheiten der Ablauf eines Data-Science-Projekts besitzt und welche Kompetenzen vorhanden sein müssen, um ein solches Projekt durchzuführen. In der Hoffnung, einen kleinen Beitrag zur Beseitigung dieser Unklarheiten leisten zu können, haben wir von April 2019 bis Februar 2020 in einer offenen und virtuellen Arbeitsgruppe mit Vertretern aus Theorie und Praxis das vorliegende Dokument erarbeitet, in dem ein Vorgehensmodell für Data-Science-Projekte beschrieben wird – das Data Science Process Model (DASC-PM). Ziel war es dabei nicht, neue Herangehensweisen zu entwickeln, sondern viel-mehr, vorhandenes Wissen zusammenzutragen und in geeigneter Form zu strukturieren. Die Ausarbeitung ist als Zusammenführung der Erfahrung sämtlicher Teilnehmerinnen und Teilnehmer dieser Arbeitsgruppe zu verstehen
Vorschlag eines morphologischen Kastens zur Charakterisierung von Data-Science-Projekten
Data-Science-Projekte sind typischerweise interdisziplinär, adressieren vielfältige Problemstellungen aus unterschiedlichen Domänen und sind häufig durch heterogene Projektmerkmale geprägt. Bestrebungen in Richtung einer einheitlichen Charakterisierung von Data-Science-Projekten sind insbesondere dann relevant, wenn über deren Durchführung entschieden werden soll – beispielsweise anhand von Kriterien wie Ressourcenbedarf, Datenverfügbarkeit oder potenziellen Risiken. Nach bestem Wissen der Autoren fehlt es jedoch in Wissenschaft und Praxis bisher an einschlägigen Ansätzen. Mit diesem Artikel wird ein erster Schritt auf dem Weg hin zu einem Ansatz für eine einheitliche Charakterisierung von Data-Science-Projekten gegangen, indem ein morphologischer Kasten vorgeschlagen wird, der im Rahmen einer dreischrittigen Analyse auf Basis eines Fragenkataloges abgeleitet wurde. Er umfasst sieben Dimensionen mit 32 Dimensionsausprägungen und wird anhand einer Fallstudie aus dem Gebiet der Predictive Maintenance illustriert. Der morphologische Kasten bietet theoretische und praktische Anwendungspotenziale für den strukturierten Vergleich von Data-Science-Projekten und die Definition von Projektportfolios, erhebt jedoch keinen Anspruch auf Vollständigkeit. Er ist somit als Vorschlag und Anstoß zum Einstieg in einen weiterführenden Diskurs anzusehen.Open Access funding enabled and organized by Projekt DEAL
DASC-PM v1.1 A Process Model for Data Science Projects
In February 2020, the first version of a comprehensive process model for data science projects appeared: the Data Science Process Model (DASC-PM). The positive feedback we have received indicates we were able to contribute to the discussion of data science activities that we were hoping for. Over the last two years, the DASC-PM has found its way into practice, book contributions (such as Alekozai et al., 2021), and scientific conferences (such as Schulz et al., 2020). We would like to sincerely thank all the readers who have shared their experiences with us and drawn our attention to the model’s strengths and potential improvements. Of course, special thanks go to those who actively participated in developing the model further. Without them, the path to this Version 1.1 would have been impossible. This version addresses feedback from theory and practice, as well as a few topics we feel strongly about. For example, we have made the document more legible by giving it a more compelling structure and shorter introductory texts. The model itself now more clearly defines the key areas and phases and their characteristics and shows how their interaction can look in various project configurations, including agile ones. We have examined all the terms used in the document with a critical eye and adjusted and standardized them where necessary. To that end, we have also addressed suggestions for a less formal visualization that is more plausible in practice, and—hopefully, at least—made both the document and the actual model more graphically appealing. Since the DASC-PM was created “by many for many,” we felt it was worthwhile to make the overall presentation of the model more accessible, even if it might be a little less scientifically precise. In terms of content, while developing Version 1.1, we focused on the “project order” phase. Important decisions are made and framework conditions are established at the beginning of data science activities. To that end, we are offering a more comprehensive description of the phase and a practical and applicable questionnaire as a concrete basis for both new and experienced users of data science. Just as in Version 1.0, the results should be seen as the aggregate experiences of all the participants of this working group. This English translation of the original German model makes it possible to use it in international projects, more easily supporting the interdisciplinarity that is intrinsic to data science. All the results presented in the DASC-PM are still mostly based on the feedback of a diverse working group and constitute a state of debate that is meant to serve as a stimulus and support but never claims to have the last word in the very active field of data science. We are pleased that this living vitality will continue to motivate us to discuss and modify the DASC-PM and make it available to a wide audience. If you are interested in participating or want to be kept up-do-date about current developments of the model, contact us at the address given below. Elmshorn, Halle (Saale), Hamburg, Krefeld, Mönchengladbach and Stuttgart in June 2022 | The DASC-PM Core Team | Contact: [email protected] |Supported by the NORDAKADEMIE foundatio