    Statistische Methoden zur Identifikation von Patientensubgruppen aus Hochdurchsatzdaten

    Für die personalisierte Medizin ist die Entdeckung von bisher unbekannten molekularen Patientensubgruppen von großer Bedeutung, um neue maßgeschneiderte Therapien entwickeln zu können. Diese Subgruppen (SG) können auf verschiedenen Ebenen untersucht werden. Beispielsweise werden Daten zur Gen- oder Proteinexpression analysiert, die mithilfe von Microarrays oder RNA-Seq-Messungen erhoben wurden. Für die Detektion von Patientensubgruppen aus solchen hochdimensionalen Daten wurden in der Literatur bereits verschiedene univariate und multivariate Ansätze vorgeschlagen. Im ersten Teil dieser Arbeit wird eine ausführliche Simulationsstudie zum Vergleich einiger solcher univariater Methoden durchgeführt. Ferner wird die Eignung des Scores Fisher Sum (FS) zur Detektion insbesondere kleinerer SG in Simulationen und der Anwendung auf reale Daten demonstriert. Die entsprechenden Ergebnisse wurden bereits in [1] besprochen. Der zweite Teil der Arbeit behandelt multivariate Workflows zur SG-Detektion. Der FSOL-Workflow [2] und seine Variante FSJ werden vorgestellt und in Simulationen und Anwendungsbeispielen mit einer etablierten Methode verglichen. Der dreischrittige Workflow (allgemein FSx-Workflow) besteht aus der FS-basierten Selektion von potentiell SG-anzeigenden Variablen, die im folgenden Schritt entsprechend der angezeigten SG-Samples gruppiert werden. Bei FSOL wird dazu ein Ähnlichkeitsmaß basierend auf dem Ordered-List-Algorithmus [3] verwendet, alternativ wird in FSJ der Jaccardindex eingesetzt. Der letzte Schritt dient der Nominierung von Sample-SG bezüglich der gebildeten Variablengruppen. Als Referenzworkflow dient der Plaid-Algorithmus (BC) [4], ein Biclusterverfahren, das insbesondere für die Auswertung von Genexpressionsdaten gern verwendet wird. Die vierte betrachtete multivariate Methode FSBC verbindet die univariate FS-Selektion mit dem Biclustern. Durch die starke Dimensionsreduktion soll die Variabilität der Biclusterergebnisse gesenkt werden. In der Simulationsstudie zum Vergleich der vier Methoden wird der Einfluss verschiedener datensatzspezifischer Parameter untersucht, ebenso der von workflowspezifischen Parametern für FSx. Das Biclustern zeigt nur im Fall großer Anzahlen von Variablen, die auf eine Subgruppe hinweisen, die beste Detektionsgüte. Durch die Vorselektion bei FSBC kann die BC-Performanz erheblich gesteigert werden, allerdings wurden SG mit einem kleineren Shift weiterhin nicht erkannt. Für den Großteil der betrachteten Parameterkonstellationen zeigt FSJ bessere Ergebnisse als FSOL. Die vier Methoden werden auf zwei reale Datensätze angewendet und ihre Detektionsgüte bezüglich einer jeweils bekannten Subgruppe verglichen. Für die Bicluster-basierten Ansätze BC und FSBC wird mithilfe einfacher deskriptiver Methoden versucht, ein Konsensbicluster zu bestimmen. Nur mithilfe der FS-Selektion können in beiden Beispielen Hinweise auf die gesuchte SG gewonnen werden. Insgesamt werden mit FSOL und FSJ zwei Workflow-Varianten vorgestellt, die geeignet sind, um in hochdimensionalen Daten Hinweise auf unbekannte SG aufzudecken. Dies gilt insbesondere wenn der Shift der Beobachtungswerte der Subgruppe klein ist oder wenn nur eine geringe Anzahl von Variablen auf die Subgruppe hinweist. FSJ scheint der FSOL-Variante vorzuziehen zu sein: FSJ war in den Simulationen zumeist überlegen, ist gerade für klinische Anwender intuitiv verständlich, zudem ressourcenschonender und deterministisch. Die Performanz der etablierten Biclustermethode wird durch den Schritt univariater Vorselektion deutlich verbessert. Sie liefert jedoch im Gegensatz zu den FSx-Workflows mitunter sehr variable Ergebnisse, die eine weitere Kondensierung erfordern

    Influence of diagram layout and scrolling on understandability of BPMN processes: an eye tracking experiment with BPMN diagrams

    Business process modeling is an important activity for developing software systems—especially within digitization projects and when realizing digital business models. Specifying requirements and building executable workflows is often done by using BPMN 2.0 process models. Although there are several style guides available for BPMN, e.g., by Silver and Richard (BPMN method and style, vol 2, Cody-Cassidy Press, Aptos, 2009), there has not been much empirical research done into the consequences of the diagram layout. In particular, layouts that require scrolling have not been investigated yet. The aim of this research is to establish layout guidelines for business process modeling that help business process modelers to create more understandable business process diagrams. For establishing benefits and penalties of different layouts, a controlled eye tracking experiment was conducted, in which data of 21 professional software developers was used. Our results show that horizontal layouts are less demanding and that as many diagram elements as possible should be put on the initially visible screen area because such diagram elements are viewed more often and longer. Additionally, diagram elements related to the reader’s task are read more often than those not relevant to the task. BPMN modelers should favor a horizontal layout and use a more complex snake or multi-line layout whenever the diagrams are too large to fit on one page in order to support BPMN model comprehension

    When details are difficult to portray: enriching vision videos

    The creation of a shared understanding of the project vision of all relevant stakeholders is vital to the requirements engineering process. One way to create such a shared understanding is through the use of vision videos that visualize the project vision at an early project stage. However, not all functional aspects can be presented. For example, the fact that an access code is valid for only a single use can be hard to visualize. One low-effort solution could be the insertion of short texts or short audio clips. In this work, our question is twofold: What effects do short pieces of additional information have in vision videos? What are suitable ways to add this information to vision videos? To answer these research questions, we investigated three different methods of inserting additional information to vision videos in an eye tracking study. We inserted short texts either below the scene or as overlays and also investigated the addition of short audio clips. These methods were evaluated in terms of participants’ video comprehension, visual effort, cognitive load and subjective preference. The results of our study show that the pieces of additional information improve vision comprehension, thereby supporting the creation of a shared understanding. All investigated methods lead to only marginal increases of the viewers’ cognitive load. Based on our results, we derive recommendations on how to insert additional information in vision videos

    Evaluation of the biomarker candidate MFAP4 for non-invasive assessment of hepatic fibrosis in hepatitis C patients

    Background:\textbf {Background:} The human microfibrillar-associated protein 4 (MFAP4) is located to extracellular matrix fibers and plays a role in disease-related tissue remodeling. Previously, we identified MFAP4 as a serum biomarker candidate for hepatic fibrosis and cirrhosis in hepatitis C patients. The aim of the present study was to elucidate the potential of MFAP4 as biomarker for hepatic fibrosis with a focus on the differentiation of no to moderate (F0–F2) and severe fibrosis stages and cirrhosis (F3 and F4, Desmet-Scheuer scoring system). Methods:\textbf {Methods:} MFAP4 levels were measured using an AlphaLISA immunoassay in a retrospective study including n\it n = 542 hepatitis C patients. We applied a univariate logistic regression model based on MFAP4 serum levels and furthermore derived a multivariate model including also age and gender. Youden-optimal cutoffs for binary classification were determined for both models without restrictions and considering a lower limit of 80% sensitivity (correct classification of F3 and F4), respectively. To assess the generalization error, leave-one-out cross validation (LOOCV ) was performed. Results:\textbf {Results:} MFAP4 levels were shown to differ between no to moderate fibrosis stages F0–F2 and severe stages (F3 and F4) with high statistical significance (t\it t test on log scale, p\it p value <2.21016<2.2·10^{-16}). In the LOOCV, the univariate classification resulted in 85.8% sensitivity and 54.9% specificity while the multivariate model yielded 81.3% sensitivity and 61.5% specificity (restricted approaches). Conclusions:\textbf {Conclusions:} We confirmed the applicability of MFAP4 as a novel serum biomarker for assessment of hepatic fibrosis and identification of high-risk patients with severe fibrosis stages in hepatitis C. The combination of MFAP4 with existing tests might lead to a more accurate non-invasive diagnosis of hepatic fibrosis and allow a cost-effective disease management in the era of new direct acting antivirals

    Towards a harmonized European surveillance for dietary and physical activity indicators in young and adult populations

    Background The Policy Evaluation Network proposes a consolidated approach to measure comparable health indicators across European health surveillance systems to evaluate effectiveness of policy action. Methods In a stepwise approach, questionnaire items used by the systems for measuring diet and physical activity data to describe health indicators were identified based on their validity, reliability, and suitability to monitor achievement of health recommendations. They were collated to unified questionnaire modules and discussed bilaterally with representatives of these systems to explore barriers and facilitators for implementation. Also, establishment of a methodological competence platform was proposed, in which the surveillance and monitoring systems agree on the priorities and common quality standards for the harmonization process and to coordinate the integration of questionnaire modules into existing systems. Results In total, seven questionnaire modules were developed, of which two diet and two physical activity modules were proposed for implementation. Each module allows measurement of data reflecting only partial aspects of national and WHO recommendations related to diet and physical activity. Main barriers were the requirements of systems to monitor temporal trends and to minimize costs. Main facilitator for implementation was the systems’ use of questionnaire items that were comparable to the unified modules. Representatives agreed to participate in a methodological competence platform. Conclusion We successfully took first steps in the realization of the roadmap towards a harmonization of European surveillance by introducing unified questionnaire modules allowing the collection of comparable health indicators and by initiating the establishment of a competence platform to guide this process

    Age-Specific Quantification of Overweight/Obesity Risk Factors From Infancy to Adolescence and Differences by Educational Level of Parents

    Objectives: To explore the age-dependent associations between 26 risk factors and BMI in early life, and differences by parental educational level.Methods: Data of 10,310 children (24,155 measurements) aged 2–16 years participating in a multi-centre European cohort from 2007 to 2014 were utilized. Trajectories of overweight/obesity risk factors and their age-specific associations with BMI were estimated using polynomial mixed-effects models.Results: Exposure to most unfavourable factors was higher in the low/medium compared to the high education group, e.g., for PC/TV time (12.6 vs. 10.6 h/week). Trajectories of various risk factors markedly changed at an age of 9–11 years. Having a family history of obesity, maternal BMI, pregnancy weight gain and birth weight were positively associated with BMI trajectories throughout childhood/adolescence in both education groups; associations of behavioural factors with BMI were small. Parental unemployment and migrant background were positively associated with BMI in the low/medium education group.Conclusion: Associations of risk factors with BMI trajectories did not essentially differ by parental education except for social vulnerabilities. The age period of 9–11 years may be a sensitive period for adopting unfavourable behaviours

    Identification and Characterization of Human Observational Studies in Nutritional Epidemiology on Gut Microbiomics for Joint Data Analysis

    In any research field, data access and data integration are major challenges that even large, well-established consortia face. Although data sharing initiatives are increasing, joint data analyses on nutrition and microbiomics in health and disease are still scarce. We aimed to identify observational studies with data on nutrition and gut microbiome composition from the Intestinal Microbiomics (INTIMIC) Knowledge Platform following the findable, accessible, interoperable, and reusable (FAIR) principles. An adapted template from the European Nutritional Phenotype Assessment and Data Sharing Initiative (ENPADASI) consortium was used to collect microbiome-specific information and other related factors. In total, 23 studies (17 longitudinal and 6 cross-sectional) were identified from Italy (7), Germany (6), Netherlands (3), Spain (2), Belgium (1), and France (1) or multiple countries (3). Of these, 21 studies collected information on both dietary intake (24 h dietary recall, food frequency questionnaire (FFQ), or Food Records) and gut microbiome. All studies collected stool samples. The most often used sequencing platform was Illumina MiSeq, and the preferred hypervariable regions of the 16S rRNA gene were V3-V4 or V4. The combination of datasets will allow for sufficiently powered investigations to increase the knowledge and understanding of the relationship between food and gut microbiome in health and disease