171 research outputs found

    From SPLs to Open, Compositional Platforms

    Get PDF
    In this position paper we reflect on how software development in large organizations such as ours is slowly changing from being top down managed, as is common in SPL organizations, towards something that increasingly resembles what is happening in large open source organizations. Additionally, we highlight what this means in terms of organization and tooling

    Benchmarking of differential abundance methods and development of bioinformatics and statistical tools for metagenomics data analysis

    Get PDF
    L'analisi di dati nell'ambito del microbioma e della metagenomica è stato il tema principale del mio dottorato. L'obiettivo primario di questa tesi si muove attorno all'osservazione dei limiti dei metodi per lo studio dell'abbondanza differenziale e culmina con la creazione di un framework analitico che permette la loro misurazione e comparazione. Come obiettivo secondario, inoltre, la tesi vuole enfatizzare la necessità di una solida analisi statistica esplorativa ed inferenziale nei dati di metabarcoding, tramite la presentazione di alcuni casi studio. Inizio presentando 2 studi strettamente collegati in cui i metodi per l'analisi di abbondanza differenziale sono i protagonisti. L'analisi di abbondanza differenziale è lo strumento principale per individuare differenze nelle composizioni delle comunità microbiche in gruppi di campioni di diversa provenienza. Rappresenta quindi il primo passo per la comprensione delle comunità microbiche, delle relazioni tra i loro membri e di questi con l'ambiente. Il primo studio riguarda un lavoro di confronto tra metodi. A partire da una collezione di dataset metagenomici, l'obiettivo era di valutare le performance di metodi per l'analisi dell'abbondanza differenziale, anche nati in altri ambiti di ricerca (e.g., RNA-Seq e single-cell RNA-Seq). Invece, con il secondo studio presento un software che ho sviluppato grazie ai risultati ottenuti dalla precedente ricerca. Attualmente, il pacchetto software, in linguaggio R, è disponibile su Bioconductor (i.e., una piattaforma open-source per l'analisi e la visualizzazione di dati biologici). Esso consente agli utenti di replicare sui propri dataset il confronto tra metodi per lo studio dell'abbondanza differenziale e la conseguente analisi delle performance. Infine, mostro alcune delle sfide che ho incontrato nell'analisi di questo tipo di dato attraverso 2 casi studio riguardanti il microbioma umano, la sua composizione e dinamica, sia in stato di salute che malattia. Nel primo studio, dei soggetti sani sono stati trattati con una mistura di probiotici per valutare variazioni del microbiota intestinale ed eventuali associazioni con alcuni aspetti psicologici. Un'attenta analisi esplorativa, l'impiego di tecniche di clustering e l'utilizzo di modelli di regressione lineare ad effetti misti hanno consentito di svelare un forte effetto soggetto-specifico e la presenza di diversi batteriotipi di partenza che mascheravano l'effetto complessivo del trattamento probiotico. Invece, nel secondo studio mostro come, a partire da campioni salivari, sono stati individuati dei biomarcatori associati all'esofagite eosinofila (i.e., una malattia cronica immuno-mediata a carico dell'esofago che causa disfagia, occlusioni e stenosi esofagee). Nonostante la bassa numerosità campionaria è stato possibile costruire un modello per discriminare tra casi e controlli con una buona accuratezza. Anche se ancora prematuro, questo risultato rappresenta un passo promettente verso la diagnosi non invasiva di questa malattia che per il momento viene fatta solo tramite biopsia esofagea.Microbiome and metagenomics data analysis has been the main theme of my PhD programme. As a main goal, the thesis moves from the observed limitations of the differential abundance analysis tools to a benchmark and a framework against which they could be measured and compared. Furthermore, as a secondary goal, the presentation of some case studies wants to emphasise the need for a sound exploratory and inferential statistical analysis in metabarcoding data. Firstly, I present two closely related studies in which differential abundance analysis methods play the main role. The differential abundance analysis is the principal approach to detect differences in microbial community compositions between different sample groups, and hence, for understanding microbial community structures and the relationships between microbial compositions and the environment. I start by introducing a benchmarking study in which differential abundance analysis methods, even from different domains (e.g., RNA-Seq and single-cell RNA-Seq), were used in a collection of microbiome datasets to evaluate their performance. Then, I continue with the presentation of software package that I developed from the results obtained in the previous research. The software package, in R language, is currently available on Bioconductor (i.e., an open-source software platform for analysing and visualising biological data). It allows users to replicate the benchmarking of differential abundance analysis methods and evalute their performances on their own datasets. Secondly, I highlight the microbiome data analysis challenges presenting two case studies about the human microbiome and its composition and dynamics in both disease and healthy states. In the first study, healthy volunteers were treated with a probiotic mixture and the changes in the gut microbiome were studied in conjunction with some psychological aspects. A careful data exploration, clustering, and mixed-effects regression models, unveiled subject-specific effects and the presence of different bacteriotypes which masked the probiotic effect. Instead, in the second study I show how to identify disease-related microbial biomarkers for eosinophilic oesophagitis (i.e., a chronic immune-mediated inflammatory disease of the oesophagus that causes dysphagia, food impaction of the oesophagus, and esophageal strictures) from saliva. Despite the low sample size it was possible to train a model to discriminate between case and control states with a decent accuracy. While still premature, this represents a promising step for the non-invasive diagnosis of eosinophilic oesophagitis which is now possible only through esophageal biopsy

    A virtual factory for smart city service integration

    Get PDF
    Tese de Doutoramento em Informática (MAP-i)In the context of smart cities, governments are investing efforts on creating public value through the development of digital public services (DPS) focusing on specific policy areas, such as transport. Main motivations to deliver DPS include reducing administrative burdens and costs, increasing effectiveness and efficiency of government processes, and improving citizens’ quality of life through enhanced services and simplified interactions with governments. To ensure effective planning and design of DPS in a given domain, governments face several challenges, like the need of specialized tools to facilitate the effective planning and the rapid development of DPS, as well as, tools for service integration, affording high development costs, and ensuring DPS conform with laws and regulations. These challenges are exacerbated by the fact that many public administrations develop tailored DPS, disregarding the fact that services share common functionality and business processes. To address the above challenges, this thesis focuses on leveraging the similarities of DPS and on applying a Software Product Line (SPL) approach combined with formal methods techniques for specifying service models and verifying their behavioural properties. In particular, the proposed solution introduces the concept of a virtual factory for the planning and rapid development of DPS in a given smart city domain. The virtual factory comprises a framework including software tools, guidelines, practices, models, and other artefacts to assist engineers to automate and make more efficient the development of a family of DPS. In this work the virtual factory is populated with tools for government officials and software developers to plan and design smart mobility services, and to rapidly model DPS relying on SPLs and components-base development techniques. Specific contributions of the thesis include: 1) the concept of virtual factory; 2) a taxonomy for planning and designing smart mobility services; 3) an ontology to fix a common vocabulary for a specific family of DPS; 4) a compositional formalism to model SPLs, to serve as a specification language for DPS; and 5) a variable semantics for a coordination language to simplify coordination of services in the context of SPLs.No contexto das cidades inteligentes, os governos investem esforços na criação de valor público através do desenvolvimento de serviços públicos digitais (DPS), concentrandose em áreas políticas específicas, como os transportes. As principais motivações para entregar o DPS incluem a redução de custos administrativos, o aumento da eficácia dos processos do governo e a melhoria da qualidade de vida dos cidadãos através de serviços melhorados e interações simplificadas com os governos. Para garantir um planeamento efetivo do DPS num determinado domínio, os governos enfrentam vários desafios, como a necessidade de ferramentas especializadas para facilitar o planeamento eficaz e o rápido desenvolvimento do DPS, bem como ferramentas para integração de DPS, reduzindo altos custos de desenvolvimento e garantindo que os DPS estejam em conformidade com as leis e regulamentos. Esses desafios são exacerbados pelo fato de que muitas administrações públicas desenvolvem o DPS sob medida, desconsiderando o fato de que os serviços compartilham funcionalidade e processos de negócios comuns. Para enfrentar os desafios, esta tese concentra-se em aproveitar as semelhanças dos DPS aplicando uma abordagem de Software Product Lines (SPL) combinada com métodos formais para especificar modelos de DPS e verificar propriedades. Em particular, introduz o conceito de uma fábrica virtual (VF) para o planeamento e desenvolvimento rápido de DPS num domínio de cidade inteligente. A VF compreende ferramentas de software, diretrizes, modelos e outros artefatos para auxiliar os engenheiros a automatizar e tornar mais eficiente o desenvolvimento de uma família de DPS. Neste trabalho, a VF é preenchida com ferramentas para várias partes para planear e projetar serviços de mobilidade inteligente (MI), e modelar rapidamente o DPS com base em SPLs e técnicas de desenvolvimento baseadas em componentes. Contribuições específicas da tese incluem: 1) o conceito de VF; 2) uma taxonomia para planear serviços de MI; 3) uma ontologia para fixar um vocabulário comum para uma família específica de DPS; 4) um formalismo composicional para modelar SPLs, e servir como uma linguagem de especificação para DPS; e 5) uma semântica variável para uma linguagem de coordenação para simplificar a coordenação.This work was funded by FCT – Foundation for Science and Technology, the Portuguese Ministry of Science, Technology and Higher Education, through the Operational Programme for Human Capital (POCH). Grant reference: PD/BD/52238/201

    Statistical Methods for Integrative Analysis, Subgroup Identification, and Variable Selection Using Cancer Genomic Data

    Get PDF
    In recent years, comprehensive cancer genomics platform, such as The Cancer Genome Atlas (TCGA), provides access to an enormous amount of high throughput genomic datasets for each patient, including gene expression, DNA copy number alteration, DNA methylation, and somatic mutation. Currently most existing analysis approaches focused only on gene-level analysis and suffered from limited interpretability and low reproducibility of findings. Additionally, with increasing availability of the modern compositional data including immune cellular fraction data and high-dimensional zero-inflated microbiome data, variable selection techniques for compositional data became of great interest because they allow inference of key immune cell types (immunology data) and key microbial species (microbiome data) associated with development and progression of various diseases. In the first dissertation aim, we address these challenges by developing a Bayesian sparse latent factor model for pathway-guided integrative genomic data analysis. Specifically, we constructed a unified framework to simultaneously identify cancer patient subgroups (clustering) and key molecular markers (variable selection) based on the joint analysis of continuous, binary and count data. In addition, we applied Polya-Gamma mixtures of normal for binary and count data to promote an exact and fully automatic posterior sampling. Moreover, pathway information was used to improve accuracy and robustness in identification of cancer patient subgroups and key molecular features. In the second dissertation aim, we developed the R package InGRiD , a comprehensive software for pathway-guided integrative genomic data analysis. We further implemented the statistical model developed in Aim 1 and provide it as a part of this software. The third dissertation aim exploits variable selection in compositional data analysis with application to immunology data and microbiome data. Specifically, we identified key immune cell types by applying a stepwise pairwise log-ratio procedure to the immune cellular fractions data, while selecting key species in the microbiome data by using zero-inflated Wilcoxon rank sum test. These approaches consider key components specific to these data types, such as compositionality (i.e., sum-to-one), zero inflation, and high dimensionality, among others. The proposed methods were developed and evaluated on: 1) large scale, high dimensional, and multi-modal datasets from the TCGA database, including gene expression, DNA copy number alteration, and somatic mutation data (Aim 1); 2) cellular fraction data induced from Colorectal Adenocarcinoma TCGA Pan-Cancer study (Aim 3); 3) high dimensional zero-inflated microbiome data from studies of colorectal cancer (Aim 3)

    Conceptual Variability Management in Software Families with Multiple Contributors

    Get PDF
    To offer customisable software, there are two main concepts yet: software product lines that allow the product customisation based on a fixed set of variability and software ecosystems, allowing an open product customisation based on a common platform. Offering a software family that enables external developers to supply software artefacts means to offer a common platform as part of an ecosystem and to sacrifice variability control. Keeping full variability control means to offer a customisable product as a product line, but without the support for external contributors. This thesis proposes a third concept of variable software: partly open software families. They combine a customisable platform similar to product lines with controlled openness similar to ecosystems. As a major contribution of this thesis a variability modelling concept is proposed which is part of a variability management for these partly open software families. This modelling concept is based on feature models and extends them to support open variability modelling by means of interfaces, structural interface specifications and the inclusion of semantic information. Additionally, the introduction of a rights management allows multiple contributors to work with the model. This is required to enable external developers to use the model for the concrete extension development. The feasibility of the proposed model is evaluated using a prototypically developed modelling tool and by means of a case study based on a car infotainment system

    Unified GUI adaptation in Dynamic Software Product Lines

    Get PDF
    In the modern world of mobile computing and ubiquitous technology, society is able to interact with technology in new and fascinating ways. To help provide an improved user experience, mobile software should be able to adapt itself to suit the user. By monitoring context information based on the environment and user, the application can better meet the dynamic requirements of the user. Similarly, it is noticeable that programs can require different static changes to suit static requirements. This program commonality and variability can benefit from the use of Software Product Line Engineering, reusing artefacts over a set of similar programs, called a Software Product Line (SPL). Historically, SPLs are limited to handling static compile time adaptations. Dynamic Software Product Lines (DSPL) however, allow for the program configuration to change at runtime, allow for compile time and runtime adaptation to be developed in a single unified approach. While currently DSPLs provide methods for dealing with program logic adaptations, variability in the Graphical User Interface (GUI) has largely been neglected. Due to this, depending on the intended time to apply GUI adaptation, different approaches are required. The main goal of this work is to extend a unified representation of variability to the GUI, whereby GUI adaptation can be applied at compile time and at runtime. In this thesis, an approach to handling GUI adaptation within DSPLs, providing a unified representation of GUI variability is presented. The approach is based on Feature-Oriented Programming (FOP), enabling developers to implement GUI adaptation along with program logic in feature modules. This approach is applied to Document-Oriented GUIs, also known as GUI description languages. In addition to GUI unification, we present an approach to unifying context and feature modelling, and handling context dynamically at runtime, as features of the DSPL. This unification can allow for more dynamic and self-aware context acquisition. To validate our approach, we implemented tool support and middleware prototypes. These different artefacts are then tested using a combination of scenarios and scalability tests. This combination first helps demonstrate the versatility and its relevance of the different approach aspects. It further brings insight into how the approach scales with DSPL size

    Linking the effects of helminth infection, diet and the gut microbiota with human whole-blood signatures

    Get PDF
    Helminth infection and dietary intake can affect the intestinal microbiota, as well as the immune system. Here we analyzed the relationship between fecal microbiota and blood profiles of indigenous Malaysians, referred to locally as Orang Asli, in comparison to urban participants from the capital city of Malaysia, Kuala Lumpur. We found that helminth infections had a larger effect on gut microbial composition than did dietary intake or blood profiles. Trichuris trichiura infection intensity also had the strongest association with blood transcriptional profiles. By characterizing paired longitudinal samples collected before and after deworming treatment, we determined that changes in serum zinc and iron levels among the Orang Asli were driven by changes in helminth infection status, independent of dietary metal intake. Serum zinc and iron levels were associated with changes in the abundance of several microbial taxa. Hence, there is considerable interplay between helminths, micronutrients and the microbiota on the regulation of immune responses in humans

    A modular metamodel and refactoring rules to achieve software product line interoperability.

    Get PDF
    Emergent application domains, such as cyber–physical systems, edge computing or industry 4.0. present a high variability in software and hardware infrastructures. However, no single variability modeling language supports all language extensions required by these application domains (i.e., attributes, group cardinalities, clonables, complex constraints). This limitation is an open challenge that should be tackled by the software engineering field, and specifically by the software product line (SPL) community. A possible solution could be to define a completely new language, but this has a high cost in terms of adoption time and development of new tools. A more viable alternative is the definition of refactoring and specialization rules that allow interoperability between existing variability languages. However, with this approach, these rules cannot be reused across languages because each language uses a different set of modeling concepts and a different concrete syntax. Our approach relies on a modular and extensible metamodel that defines a common abstract syntax for existing variability modeling extensions. We map existing feature modeling languages in the SPL community to our common abstract syntax. Using our abstract syntax, we define refactoring rules at the language construct level that help to achieve interoperability between variability modeling languages.Work supported by the projects MEDEA RTI2018-099213-B-I00, IRIS PID2021-122812OB-I00 (co-financed by FEDER funds), Rhea P18-FR-1081 (MCI/AEI/FEDER, UE), LEIA UMA18-FEDERIA-157, and DAEMON H2020-101017109. // Funding for open access: Universidad de Málaga / CBUA

    Insights on the bacterial composition of Parmigiano Reggiano Natural Whey Starter by a culture-dependent and 16S rRNA metabarcoding portrait

    Get PDF
    : Natural whey starters (NWS) are undefined bacterial communities produced daily from whey of the previous cheese-making round, by application of high temperature. As a result, in any dairy plant, NWS are continuously evolving, undefined mixtures of several strains and/or species of lactic acid bacteria, whose composition and performance strongly depend on the selective pressure acting during incubation. While NWS is critical to assure consistency to cheese-making process, little is known about the composition, functional features, and plant-to-plant fluctuations. Here, we integrated 16S rRNA metabarcoding and culture-dependent methods to profile bacterial communities of 10 NWS sampled in the production area of Parmigiano Reggiano cheese. 16S rRNA metabarcoding analysis revealed two main NWS community types, namely NWS type-H and NWS type-D. Lactobacillus helveticus was more abundant in NWS type-H, whilst Lactobacillus delbrueckii/St. thermophilus in NWS type-D, respectively. Based on the prediction of metagenome functions, NWS type-H samples were enriched in functional pathways related to galactose catabolism and purine metabolism, while NWS type-D in pathways related to aromatic and branched chain amino acid biosynthesis, which are flavor compound precursors. Culture-dependent approaches revealed low cultivability of individual colonies as axenic cultures and high genetic diversity in the pool of cultivable survivors. Co-culturing experiments showed that fermentative performance decreases by reducing the bacterial complexity of inoculum, suggesting that biotic interactions and cross-feeding relationships could take place in NWS communities, assuring phenotypic robustness. Even though our data cannot directly predict these ecological interactions, this study provides the basis for experiments targeted at understanding how selective regime affects composition, bacterial interaction, and fermentative performance in NWS
    • …
    corecore