427 research outputs found

    Statistical clustering of temporal networks through a dynamic stochastic block model

    Get PDF
    Statistical node clustering in discrete time dynamic networks is an emerging field that raises many challenges. Here, we explore statistical properties and frequentist inference in a model that combines a stochastic block model (SBM) for its static part with independent Markov chains for the evolution of the nodes groups through time. We model binary data as well as weighted dynamic random graphs (with discrete or continuous edges values). Our approach, motivated by the importance of controlling for label switching issues across the different time steps, focuses on detecting groups characterized by a stable within group connectivity behavior. We study identifiability of the model parameters, propose an inference procedure based on a variational expectation maximization algorithm as well as a model selection criterion to select for the number of groups. We carefully discuss our initialization strategy which plays an important role in the method and compare our procedure with existing ones on synthetic datasets. We also illustrate our approach on dynamic contact networks, one of encounters among high school students and two others on animal interactions. An implementation of the method is available as a R package called dynsbm

    Spatially-constrained clustering of ecological networks

    Full text link
    Spatial ecological networks are widely used to model interactions between georeferenced biological entities (e.g., populations or communities). The analysis of such data often leads to a two-step approach where groups containing similar biological entities are firstly identified and the spatial information is used afterwards to improve the ecological interpretation. We develop an integrative approach to retrieve groups of nodes that are geographically close and ecologically similar. Our model-based spatially-constrained method embeds the geographical information within a regularization framework by adding some constraints to the maximum likelihood estimation of parameters. A simulation study and the analysis of real data demonstrate that our approach is able to detect complex spatial patterns that are ecologically meaningful. The model-based framework allows us to consider external information (e.g., geographic proximities, covariates) in the analysis of ecological networks and appears to be an appealing alternative to consider such data

    Strategies for online inference of model-based clustering in large and growing networks

    Full text link
    In this paper we adapt online estimation strategies to perform model-based clustering on large networks. Our work focuses on two algorithms, the first based on the SAEM algorithm, and the second on variational methods. These two strategies are compared with existing approaches on simulated and real data. We use the method to decipher the connexion structure of the political websphere during the US political campaign in 2008. We show that our online EM-based algorithms offer a good trade-off between precision and speed, when estimating parameters for mixture distributions in the context of random graphs.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS359 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Nine Quick Tips for Analyzing Network Data

    Get PDF
    These tips provide a quick and concentrated guide for beginners in the analysis of network data

    Navigating in a sea of repeats in RNA-seq without drowning

    Full text link
    The main challenge in de novo assembly of NGS data is certainly to deal with repeats that are longer than the reads. This is particularly true for RNA- seq data, since coverage information cannot be used to flag repeated sequences, of which transposable elements are one of the main examples. Most transcriptome assemblers are based on de Bruijn graphs and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. The results of this work are twofold. First, we introduce a formal model for repre- senting high copy number repeats in RNA-seq data and exploit its properties for inferring a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying in a de Bruijn graph a subgraph with this charac- teristic is NP-complete. In a second step, we show that in the specific case of a local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs. In particular, we designed and implemented an algorithm to efficiently identify AS events that are not included in repeated regions. Finally, we validate our results using synthetic data. We also give an indication of the usefulness of our method on real data

    Core-periphery dynamics in a plant-pollinator network

    Get PDF
    Mutualistic networks are highly dynamic, characterized by high temporal turnover of species and interactions. Yet, we have a limited understanding of how the internal structure of these networks and the roles species play in them vary through time. We used 6 years of observation data and a novel statistical method (dynamic stochastic block models) to assess how network structure and species' structural position within the network change throughout subseasons of the flowering season and across years in a quantitative plant–pollinator network from a dryland ecosystem in Argentina. Our analyses revealed a core–periphery structure persistent through subseasons and years. Yet, species structural position as core or peripheral was highly dynamic: virtually all species that were at the core in some subseasons were also peripheral in other subseasons, while many other species always remained peripheral. Our results illuminate our understanding of the dynamics of mutualistic networks and have important implications for ecosystem management and conservation.Fil: Miele, Vincent. Centre National de la Recherche Scientifique; FranciaFil: Ramos Jiliberto, Rodrigo. Universidad Mayor; ChileFil: Vazquez, Diego P.. Universidad Nacional de Cuyo. Facultad de Ciencias Exactas y Naturales; Argentina. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Mendoza. Instituto Argentino de Investigaciones de las Zonas Áridas. Provincia de Mendoza. Instituto Argentino de Investigaciones de las Zonas Áridas. Universidad Nacional de Cuyo. Instituto Argentino de Investigaciones de las Zonas Áridas; Argentin

    Revealing the hidden structure of dynamic ecological networks

    Get PDF
    International audienceRecent technological advances and long-term data studies provide interaction data that can be modelled through dynamic networks, i.e a sequence of different snapshots of an evolving ecological network. Most often time is the parameter along which these networks evolve but any other one-dimensional gradient (temperature, altitude, depth, humidity, . . . ) could be considered.Here we propose a statistical tool to analyse the underlying structure of these networks and follow its evolution dynamics (either in time or any other one-dimensional factor). It consists in extracting the main features of these networks and summarise them into a high-level view.We analyse a dynamic animal contact network and a seasonal food web and in both cases we show that our approach allows for the identification of a backbone organisation as well as interesting temporal variations at the individual level.Our method, implemented into the R package dynsbm, can handle the largest ecological datasets and is a versatile and promising tool for ecologists that study dynamic interactions

    QuantiFERON-TB gold in-tube implementation for latent tuberculosis diagnosis in a public health clinic: a cost-effectiveness analysis

    Get PDF
    BACKGROUND: The tuberculin skin test (TST) has limitations for latent tuberculosis infection (LTBI) diagnosis in low-prevalence settings. Previously, all TST-positive individuals referred from the community to Baltimore City Health Department (BCHD) were offered LTBI treatment, after active TB was excluded. In 2010, BCHD introduced adjunctive QuantiFERON-TB Gold In-Tube (QFT-GIT) testing for TST-positive referrals. We evaluated costs and cost-effectiveness of this new diagnostic algorithm. METHODS: A decision-analysis model compared the strategy of treating all TST-positive referrals versus only those with positive results on adjunctive QFT-GIT testing. Costs were collected at BCHD, and Incremental Cost-Effectiveness Ratios (ICERs) were utilized to report on cost-effectiveness. RESULTS: QFT-GIT testing at BCHD cost 43.51pertest.ImplementationofQFT−GITtestingwasassociatedwithanICERof43.51 per test. Implementation of QFT-GIT testing was associated with an ICER of 1,202 per quality-adjusted life-year gained and was considered highly cost-effective. In sensitivity analysis, the QFT-GIT strategy became cost-saving if QFT-GIT sensitivity increased above 92% or if less than 3.5% of individuals with LTBI progress to active TB disease. CONCLUSIONS: LTBI screening with TST in low-prevalence settings may lead to overtreatment and increased expenditures. In this public health clinic, additional QFT-GIT testing of individuals referred for a positive TST was cost-effective

    Deep learning for species identification of modern and fossil rodent molars

    Get PDF
    Reliable identification of species is a key step to assess biodiversity. In fossil and archaeological contexts, genetic identifications remain often difficult or even impossible and morphological criteria are the only window on past biodiversity. Methods of numerical taxonomy based on geometric morphometric provide reliable identifications at the specific and even intraspecific levels, but they remain relatively time consuming and require expertise on the group under study. Here, we explore an alternative based on computer vision and machine learning. The identification of three rodent species based on pictures of their molar tooth row constituted the case study. We focused on the first upper molar in order to transfer the model elaborated on modern, genetically identified specimens to isolated fossil teeth. A pipeline based on deep neural network automatically cropped the first molar from the pictures, and returned a prediction regarding species identification. The deep-learning approach performed equally good as geometric morphometrics and, provided an extensive reference dataset including fossil teeth, it was able to successfully identify teeth from an archaeological deposit that was not included in the training dataset. This is a proof-of-concept that such methods could allow fast and reliable identification of extensive amounts of fossil remains, often left unstudied in archaeological deposits for lack of time and expertise. Deep-learning methods may thus allow new insights on the biodiversity dynamics across the last 10.000 years, including the role of humans in extinction or recent evolution

    Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads

    Get PDF
    International audienceAbstractBackground The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them.ResultsThe results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99–111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644–652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086–1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when compared to other transcriptome evaluation methods is that we use only the topology of the DBG, and not read nor coverage information. We show that our simple method gives better results than Rsem-Eval (Li et al. in Genome Biol 15(12):553, 4) and TransRate (Smith-Unna et al. in Genome Res 26(8):1134–1144, 5) on both real and simulated datasets for detecting chimeras, and therefore is able to capture assembly errors missed by these methods
    • 

    corecore