185 research outputs found

    Towards a network entropy formalism for gene expression data analysis

    Get PDF
    In many situations it has been widely recognized that studies based on single-gene differential expression statistical analyses are too simplistic, since they do not consider the complexity of the underlying biological system, mainly based on specific interaction between genes (at a transciptomic, proteomic and metabolomic level). For this reason, novel approaches are sought aiming to exploit networkbased methods, in a Systems Biology perspective, capable of integrating singleprobe measurements with biological information at a whole-genome scale. We describe a method, based on Statistical Mechanics and Network Theory, that goes into this direction, combining what is actually known about gene-gene interactions at a protein level and high-throughput mRNA data obtained in different experimental conditions. We will provide a framework for a phenomenological interpretation of Entropy of a Network Ensemble based on an Information Theory approach

    Systems Biology approaches to cancer: towards new therapeutical strategies and personalized approaches

    Get PDF
    Network approaches are ubiquitous, from social and ecological systems up to complex biological processes. In our recently published work we used the network framework for a Systems Medicine approach to multiple cancer types, in order to highlight similitudes and differences that can be exploited to extend existing therapeutical strategies. These approaches shed new light to oncological processes, but allow also to pose \u201cold\u201d questions (like the search for novel drug targets) in a \u201cnew\u201d way

    Weighted Multiplex Networks

    Get PDF
    One of the most important challenges in network science is to quantify the information encoded in complex network structures. Disentangling randomness from organizational principles is even more demanding when networks have a multiplex nature. Multiplex networks are multilayer systems of NN nodes that can be linked in multiple interacting and co-evolving layers. In these networks, relevant information might not be captured if the single layers were analyzed separately. Here we demonstrate that such partial analysis of layers fails to capture significant correlations between weights and topology of complex multiplex networks. To this end, we study two weighted multiplex co-authorship and citation networks involving the authors included in the American Physical Society. We show that in these networks weights are strongly correlated with multiplex structure, and provide empirical evidence in favor of the advantage of studying weighted measures of multiplex networks, such as multistrength and the inverse multiparticipation ratio. Finally, we introduce a theoretical framework based on the entropy of multiplex ensembles to quantify the information stored in multiplex networks that would remain undetected if the single layers were analyzed in isolation.Comment: (22 pages, 10 figures

    Functional models for large-scale gene regulation networks: realism and fiction

    Full text link
    High-throughput experiments are shedding light on the topology of large regulatory networks and at the same time their functional states, namely the states of activation of the nodes (for example transcript or protein levels) in different conditions, times, environments. We now possess a certain amount of information about these two levels of description, stored in libraries, databases and ontologies. A current challenge is to bridge the gap between topology and function, i.e. developing quantitative models aimed at characterizing the expression patterns of large sets of genes. However, approaches that work well for small networks become impossible to master at large scales, mainly because parameters proliferate. In this review we discuss the state of the art of large-scale functional network models, addressing the issue of what can be considered as realistic and what the main limitations may be. We also show some directions for future work, trying to set the goals that future models should try to achieve. Finally, we will emphasize the possible benefits in the understanding of biological mechanisms underlying complex multifactorial diseases, and in the development of novel strategies for the description and the treatment of such pathologies.Comment: to appear on Mol. BioSyst. 200

    Networks from gene expression time series: characterization of correlation patterns

    Full text link
    This paper describes characteristic features of networks reconstructed from gene expression time series data. Several null models are considered in order to discriminate between informations embedded in the network that are related to real data, and features that are due to the method used for network reconstruction (time correlation).Comment: 10 pages, 3 BMP figures, 1 Table. To appear in Int. J. Bif. Chaos, July 2007, Volume 17, Issue

    Network Entropy measures applied to different systemic perturbations of cell basal state

    Get PDF
    NOTE: includes supplementary materialNOTE: includes supplementary materialNOTE: includes supplementary materialWe characterize different cell states, related to cancer and ageing phenotypes, by a measure of entropy of network ensembles, integrating gene expression values and protein interaction networks. The entropy measure estimates the parameter space available to the network ensemble, that can be interpreted as the level of plasticity of the system for high entropy values (the ability to change its internal parameters, e.g. in response to environmental stimuli), or as a fine tuning of the parameters (that restricts the range of possible parameter values) in the opposite case. This approach can be applied at different scales, from whole cell to single biological functions, by defining appropriate subnetworks based on a priori biological knowledge, thus allowing a deeper understanding of the cell processes involved. In our analysis we used specific network features (degree sequence, subnetwork structure and distance between gene profiles) to obtain informations at different biological scales, providing a novel point of view for the integration of experimental transcriptomic data and a priori biological knowledge, but the entropy measure can also highlight other aspects of the biological systems studied depending on the constraints introduced in the model (e.g. community structures)

    Stochastic analysis of a miRNA-protein toggle switch

    Get PDF
    none5Within systems biology there is an increasing interest in the stochastic behavior of genetic and biochemical reaction networks. An appropriate stochastic description is provided by the chemical master equation, which represents a continuous time Markov chain (CTMC). In this paper we consider the stochastic properties of a toggle switch, involving a protein compound (E2Fs and Myc) and a miRNA cluster (miR-17-92), known to control the eukaryotic cell cycle and possibly involved in oncogenesis, recently proposed in the literature within a deterministic framework. Due to the inherent stochasticity of biochemical processes and the small number of molecules involved, the stochastic approach should be more correct in describing the real system: we study the agreement between the two approaches by exploring the system parameter space. We address the problem by proposing a simplified version of the model that allows analytical treatment, and by performing numerical simulations for the full model. We observed optimal agreement between the stochastic and the deterministic description of the circuit in a large range of parameters, but some substantial differences arise in at least two cases: (1) when the deterministic system is in the proximity of a transition from a monostable to a bistable configuration, and (2) when bistability (in the deterministic system) is "masked" in the stochastic system by the distribution tails. The approach provides interesting estimates of the optimal number of molecules involved in the toggle switch. Our discussion of the points of strengths, potentiality and weakness of the chemical master equation in systems biology and the differences with respect to deterministic modeling are leveraged in order to provide useful advice for both the bioinformatician and the theoretical scientist.openGiampieri E.; Remondini D.; de Oliveira L.; Castellani G.; Lió P.Giampieri E.; Remondini D.; de Oliveira L.; Castellani G.; Lió P

    Editorial: Integrating Whole Genome Sequencing Into Source Attribution and Risk Assessment of Foodborne Bacterial Pathogens

    Get PDF
    Source attribution and microbial risk assessment have proved to be crucial to identify and prioritize food safety interventions as to effectively control the burden of human illnesses (Cassini et al., 2016; Mughini-Gras et al., 2018a, 2019). By comparing human cases and pathogen occurrences in selected animal, food, and environmental sources, microbial subtyping approaches were successfully applied to pinpoint the most important sources of Salmonella, Campylobacter, Shiga toxin-producing Escherichia coli, and Listeria monocytogenes (Hald et al., 2004; Mullner et al., 2009a,b; Barco et al., 2013; Nielsen et al., 2017; Mughini-Gras et al., 2018b; Cody et al., 2019). Microbial risk assessment has been applied to assess known or potential adverse health effects resulting from human exposure to food-borne hazards. Through a scientific structured approach (FAO and WHO, 2021), microbial risk assessment helps to identify and quantify the risk represented by specific foods and the critical points in these foods' production chains for microbial control (Cassini et al., 2016; FAO and WHO, 2021). For both source attribution and risk assessment, one key challenge has been to define the hazard in question: is the whole foodborne pathogen species a hazard, or only some of its subtypes? In this regard the choice of the subtyping method becomes crucial. In recent years, Whole Genome Sequencing (WGS) has represented a major benefit for more targeted approaches, no longer focused on the species/genus level but at the level of subtypes (Franz et al., 2016; Fritsch et al., 2018; EFSA Panel on Biological Hazards, 2019). Besides WGS, metagenomics showed potentialities in source attribution. In particular, this approach was useful in attributing the source of environmental contamination by comparing the abundances of source-specific genetic markers (i.e., resistome) in different reservoirs (Gupta et al., 2019). Therefore, this special issue focuses on traditional and novel source attribution approaches applied on molecular, WGS, and metagenomic data as well as on a fine-tuning genetic characterization of foodborne pathogens useful for hazard identification and characterization. In particular, one study compares the outputs of a modified Hald model, which was applied to different subtyping input data of S. enterica Typhimurium and its monophasic variant (Arnold et al.) whereas two studies proposed a novel network approach and a method based on the core-genome genetic distance to attribute human infections of S. enterica Typhimurium monophasic variant and S. enterica Derby using WGS as input data (Merlotti et al.; Sévellec et al.). Another study by Duarte et al. included the relative abundance of antimicrobial resistance (AMR) associated genes (resistome) as metagenomic input data in an AMR source attribution study. Finally, two studies were focused on the molecular and genomic characterization of human isolates of Campylobacter jejuni and C. coli from China and of Listeria monocytogenes isolates collected from ready-to-eat meat and processing environment from Poland (Zhang et al.; Kurpas et al.). Arnold et al. performed a source attribution study including the genomes of S. enterica Typhimurium and its monophasic variant of 596 human sources and 327 animal sources from England and Wales between 2014 and 2016. Data from Seven Loci Multi Locus Sequence Typing (7-loci MLST), core-genome MLST (cg-MLST), and SNP calling were compared as input data. By applying a modified Hald model, 60% of human genomes were attributed to pork. Comparing different input data, results highlighted MLST as the method with the lowest fit and the lowest discriminatory power. Merlotti et al. applied a network approach to 351 human and animal genomes of S. enterica Typhimurium and its monophasic variant collected from 2013 to 2014. Three datasets of whole-genome MLST (wgMLST), cgMLST, and SNPs were used as input data. Genomes were clustered based on their genetic similarities. Interestingly, a higher percentage of cluster coherence was reported for animal sources in comparison to country and year of isolation, suggesting animal sources as the major driver of cluster formation. The approach showed to be effective in attributing up to 97.2% of human genomes to animal sources represented in the dataset. Among these genomes, the majority (84%) was attributed to pigs/pork. No significant differences were highlighted by comparing the three different input datasets. Core genome analysis was the approach applied by Sévellec et al. to attribute human sporadic cases of S. enterica Derby that occurred in France in 2014–2015 to non-human reservoirs. The authors analyzed 299 S. enterica Derby genomes corresponding to all S. enterica Derby sporadic human cases registered in the time frame, along with 141 non-human genomes. Within the non-human genomes, three main genomic lineages were detected in France: ST39-ST40 and ST682 associated to pork and ST71 associated to poultry. Within human genomes, 94% of S. enterica Derby clustered within the three genetic groups associated with pork, identifying this animal reservoir as the major contributor of S. enterica Derby to sporadic human cases in France. Relative abundance of antimicrobial resistance genes in shotgun metagenomic data was chosen in an antimicrobial resistance source attribution study by Duarte et al.. Starting from the assumption that fecal resistomes are source related, authors compared the resistomes of pooled fecal samples of pigs, broilers, turkeys, and veal calves with the resistomes of individual fecal samples of humans occupationally exposed to livestock production. Five supervised random forest models were applied on a total of 479 observations. Among the four livestock species, the results indicated that pigs have the resistome composition closest to the composition of the human resistome suggesting that occupational exposure to AMR determinants was higher among workers exposed to pigs than workers of broiler farms. Zhang et al. characterized genetic diversity and antimicrobial resistance of 236 Campylobacter jejuni and C. coli isolates collected from 2,945 individual stool samples of hospitalized patients with diarrhea in Beijing from 2017 to 2018. MLST results confirmed the high genetic diversity among isolates as well as CC21 as the most common clonal complex of C. jejuni in diarrhea patients in China. Clonal complex CC828 was the most frequently identified among C. coli isolates. Regarding antimicrobial resistance, rates higher than 88% were identified for the antimicrobials nalidixic acid, ciprofloxacin, and tetracycline. Last but not least, Kurpas et al. genetically characterized 48 L. monocytogenes isolates of PCR-serogroup IIb and IVb collected from ready-to-eat food and food processing environments. Additionally, the authors compared them with public genomes collected from humans in Poland. Among food isolates, 65% belonged to CC1, CC2, and CC6 already described as hypervirulent strains in humans. The clonal complex CC5 was also identified; mostly collected from food processing environments and belonging to PCR-serogroup IIB. Genomes of this clonal complex showed mutations in the inlA gene and a deletion of 144 bp in the inlB gene suggesting them as hypovirulent. Based on these studies, we conclude that the application of NGS data, in particular source attribution models, shows great potential. The results are improved by becoming more specific and to the point, which is considered very valuable for the decision support process. Integrations with phenotypic tests will continue to be essential for confirmation of NGS predicted outcomes

    Source Attribution of Human Campylobacteriosis Using Whole-Genome Sequencing Data and Network Analysis

    Get PDF
    Campylobacter spp. are a leading and increasing cause of gastrointestinal infections world-wide. Source attribution, which apportions human infection cases to different animal species and food reservoirs, has been instrumental in control-and evidence-based intervention efforts. The rapid increase in whole-genome sequencing data provides an opportunity for higher-resolution source attribution models. Important challenges, including the high dimension and complex structure of WGS data, have inspired concerted research efforts to develop new models. We propose network analysis models as an accurate, high-resolution source attribution approach for the sources of human campylobacteriosis. A weighted network analysis approach was used in this study for source attribution comparing different WGS data inputs. The compared model inputs consisted of cgMLST and wgMLST distance matrices from 717 human and 717 animal isolates from cattle, chickens, dogs, ducks, pigs and turkeys. SNP distance matrices from 720 human and 720 animal isolates were also used. The data were collected from 2015 to 2017 in Denmark, with the animal sources consisting of domestic and imports from 7 European countries. Clusters consisted of network nodes representing respective genomes and links representing distances between genomes. Based on the results, animal sources were the main driving factor for cluster formation, followed by type of species and sampling year. The coherence source clustering (CSC) values based on animal sources were 78%, 81% and 78% for cgMLST, wgMLST and SNP, respectively. The CSC values based on Campylobacter species were 78%, 79% and 69% for cgMLST, wgMLST and SNP, respectively. Including human isolates in the network resulted in 88%, 77% and 88% of the total human isolates being clustered with the different animal sources for cgMLST, wgMLST and SNP, respectively. Between 12% and 23% of human isolates were not attributed to any animal source. Most of the human genomes were attributed to chickens from Denmark, with an average attribution percentage of 52.8%, 52.2% and 51.2% for cgMLST, wgMLST and SNP distance matrices respectively, while ducks from Denmark showed the least attribution of 0% for all three distance matrices. The best-performing model was the one using wgMLST distance matrix as input data, which had a CSC value of 81%. Results from our study show that the weighted network-based approach for source attribution is reliable and can be used as an alternative method for source attribution considering the high performance of the model. The model is also robust across the different Campylobacter species, animal sources and WGS data types used as input
    • …