524 research outputs found

    On the Universality of Jordan Centers for Estimating Infection Sources in Tree Networks

    Full text link
    Finding the infection sources in a network when we only know the network topology and infected nodes, but not the rates of infection, is a challenging combinatorial problem, and it is even more difficult in practice where the underlying infection spreading model is usually unknown a priori. In this paper, we are interested in finding a source estimator that is applicable to various spreading models, including the Susceptible-Infected (SI), Susceptible-Infected-Recovered (SIR), Susceptible-Infected-Recovered-Infected (SIRI), and Susceptible-Infected-Susceptible (SIS) models. We show that under the SI, SIR and SIRI spreading models and with mild technical assumptions, the Jordan center is the infection source associated with the most likely infection path in a tree network with a single infection source. This conclusion applies for a wide range of spreading parameters, while it holds for regular trees under the SIS model with homogeneous infection and recovery rates. Since the Jordan center does not depend on the infection, recovery and reinfection rates, it can be regarded as a universal source estimator. We also consider the case where there are k>1 infection sources, generalize the Jordan center definition to a k-Jordan center set, and show that this is an optimal infection source set estimator in a tree network for the SI model. Simulation results on various general synthetic networks and real world networks suggest that Jordan center-based estimators consistently outperform the betweenness, closeness, distance, degree, eigenvector, and pagerank centrality based heuristics, even if the network is not a tree

    Infection Spreading and Source Identification: A Hide and Seek Game

    Full text link
    The goal of an infection source node (e.g., a rumor or computer virus source) in a network is to spread its infection to as many nodes as possible, while remaining hidden from the network administrator. On the other hand, the network administrator aims to identify the source node based on knowledge of which nodes have been infected. We model the infection spreading and source identification problem as a strategic game, where the infection source and the network administrator are the two players. As the Jordan center estimator is a minimax source estimator that has been shown to be robust in recent works, we assume that the network administrator utilizes a source estimation strategy that can probe any nodes within a given radius of the Jordan center. Given any estimation strategy, we design a best-response infection strategy for the source. Given any infection strategy, we design a best-response estimation strategy for the network administrator. We derive conditions under which a Nash equilibrium of the strategic game exists. Simulations in both synthetic and real-world networks demonstrate that our proposed infection strategy infects more nodes while maintaining the same safety margin between the true source node and the Jordan center source estimator

    On the Properties of Gromov Matrices and their Applications in Network Inference

    Full text link
    The spanning tree heuristic is a commonly adopted procedure in network inference and estimation. It allows one to generalize an inference method developed for trees, which is usually based on a statistically rigorous approach, to a heuristic procedure for general graphs by (usually randomly) choosing a spanning tree in the graph to apply the approach developed for trees. However, there are an intractable number of spanning trees in a dense graph. In this paper, we represent a weighted tree with a matrix, which we call a Gromov matrix. We propose a method that constructs a family of Gromov matrices using convex combinations, which can be used for inference and estimation instead of a randomly selected spanning tree. This procedure increases the size of the candidate set and hence enhances the performance of the classical spanning tree heuristic. On the other hand, our new scheme is based on simple algebraic constructions using matrices, and hence is still computationally tractable. We discuss some applications on network inference and estimation to demonstrate the usefulness of the proposed method

    Statistical methods for certain large, complex data challenges

    Full text link
    Big data concerns large-volume, complex, growing data sets, and it provides us opportunities as well as challenges. This thesis focuses on statistical methods for several specific large, complex data challenges - each involving representation of data with complex format, utilization of complicated information, and/or intensive computational cost. The first problem we work on is hypothesis testing for multilayer network data, motivated by an example in computational biology. We show how to represent the complex structure of a multilayer network as a single data point within the space of supra-Laplacians and then develop a central limit theorem and hypothesis testing theories for multilayer networks in that space. We develop both global and local testing strategies for mean comparison and investigate sample size requirements. The methods were applied to the motivating computational biology example and compared with the classic Gene Set Enrichment Analysis(GSEA). More biological insights are found in this comparison. The second problem is the source detection problem in epidemiology, which is one of the most important issues for control of epidemics. Ideally, we want to locate the sources based on all history data. However, this is often infeasible, because the history data is complex, high-dimensional and cannot be fully observed. Epidemiologists have recognized the crucial role of human mobility as an important proxy to a complete history, but little in the literature to date uses this information for source detection. We recast the source detection problem as identifying a relevant mixture component in a multivariate Gaussian mixture model. Human mobility within a stochastic PDE model is used to calibrate the parameters. The capability of our method is demonstrated in the context of the 2000-2002 cholera outbreak in the KwaZulu-Natal province. The third problem is about multivariate time series imputation, which is a classic problem in statistics. To address the common problem of low signal-to-noise ratio in high-dimensional multivariate time series, we propose models based on state-space models which provide more precise inference of missing values by clustering multivariate time series components in a nonparametric way. The models are suitable for large-scale time series due to their efficient parameter estimation.2019-05-15T00:00:00

    From pathway to regulon in Arabidopsis

    Get PDF
    Combined bioinformatic approaches, using genomic and transcriptomic data, are applied to investigate the fatty acid biosynthesis pathway, at the molecular level, and in the context of the system biology of Arabidopsis. Fatty acids are essential components of all known bacterial and eukaryotic cells with critical role in cells as energy reserves and the metabolic precursors for biological membranes. The pathway for fatty acid synthesis seems to be conserved across all living systems. Acetyl-CoA carboxylase, a member of a superfamily of biotin-dependent enzymes, catalyzes the first committed step of the fatty acid biosynthesis pathway. Phylogenetic study exposed complex and intertwined evolutionary histories of this family, with multiple domain fusions and rearrangements. As revealed by meta-analysis of a wide array of Arabidopsis transcriptomic data, fatty acid biosynthesis is transcriptionally regulated, and this regulation not only extends across all pathway reactions, but also some substrate- and cofactor-producing reactions, thus defining a major transcriptionally co-regulated pathway. Meta-analysis of the transcriptome is extended to find groups of coexpressed genes (also called modules, or regulons) in the Arabidopsis genome. Major functionally-coherent gene groups were identified. These comprise development, information processing, defense, and metabolism, as well as tissue- and organelle-specific processes

    Burmese pythons in Florida: A synthesis of biology, impacts, and management tools

    Get PDF
    Burmese pythons (Python molurus bivittatus) are native to southeastern Asia, however, there is an established invasive population inhabiting much of southern Florida throughout the Greater Everglades Ecosystem. Pythons have severely impacted native species and ecosystems in Florida and represent one of the most intractable invasive-species management issues across the globe. The difficulty stems from a unique combination of inaccessible habitat and the cryptic and resilient nature of pythons that thrive in the subtropical environment of southern Florida, rendering them extremely challenging to detect. Here we provide a comprehensive review and synthesis of the science relevant to managing invasive Burmese pythons. We describe existing control tools and review challenges to productive research, identifying key knowledge gaps that would improve future research and decision making for python control. (119 pp

    Genomics and epidemiology of SARS-CoV-2 in Brazil

    Get PDF
    As of the 24th January 2021, it is estimated that the coronavirus disease 2019 (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has led to over 350 million reported cases and over 5.6 million deaths worldwide. Brazil has the third highest case count, over 24 million, and the second highest death count, over 623,000. In this thesis, I apply genomic and epidemiological approaches to describe and understand SARS-CoV-2 importation, transmission, spread, evolution and response during the first year of the COVID-19 pandemic in Brazil. Chapter 2 provides and overview of the early importation, spread and response. I start by identifying the probable air routes for SARS-CoV-2 importation into Brazil. I also provide a description of the first SARS-CoV-2 cases reported in Latin America, followed by epidemiological estimates of the basic reproduction number for the most affected Brazilian states. This chapter ends with a description of the implementation and easing of non-pharmaceutical interventions (NPIs) in 72.3% of the Brazilian municipalities. In Chapter 3, I couple genomic insights obtained from a novel representative dataset of 427 SARS-CoV-2 genomes from Brazil with human mobility data to describe SARSCoV-2 importation and genomic diversity, reconstruct SARS-CoV-2 nationwide spatial spread and investigate the impact of NPIs implemented in Brazil. Chapter 4 covers the application of genomic epidemiology approaches to the identification and description of new SARS-CoV-2 variants of concern (VOCs). I describe the first two cases of the Alpha VOC in Brazil and provide a genomic characterization of the first cases of the Gamma VOC in Manaus, north Brazil. Finally, I apply epidemiological and genomic approaches to uncover the dynamics of hospital-associated transmission in the largest hospital complex in Latin America. Chapter 5 shows evidence for SARS-CoV-2 within-hospital transmission to be higher in non-COVID-19 hospitals

    Crop Disease Detection Using Remote Sensing Image Analysis

    Get PDF
    Pest and crop disease threats are often estimated by complex changes in crops and the applied agricultural practices that result mainly from the increasing food demand and climate change at global level. In an attempt to explore high-end and sustainable solutions for both pest and crop disease management, remote sensing technologies have been employed, taking advantages of possible changes deriving from relative alterations in the metabolic activity of infected crops which in turn are highly associated to crop spectral reflectance properties. Recent developments applied to high resolution data acquired with remote sensing tools, offer an additional tool which is the opportunity of mapping the infected field areas in the form of patchy land areas or those areas that are susceptible to diseases. This makes easier the discrimination between healthy and diseased crops, providing an additional tool to crop monitoring. The current book brings together recent research work comprising of innovative applications that involve novel remote sensing approaches and their applications oriented to crop disease detection. The book provides an in-depth view of the developments in remote sensing and explores its potential to assess health status in crops
    • …
    corecore