524 research outputs found
On the Universality of Jordan Centers for Estimating Infection Sources in Tree Networks
Finding the infection sources in a network when we only know the network
topology and infected nodes, but not the rates of infection, is a challenging
combinatorial problem, and it is even more difficult in practice where the
underlying infection spreading model is usually unknown a priori. In this
paper, we are interested in finding a source estimator that is applicable to
various spreading models, including the Susceptible-Infected (SI),
Susceptible-Infected-Recovered (SIR), Susceptible-Infected-Recovered-Infected
(SIRI), and Susceptible-Infected-Susceptible (SIS) models. We show that under
the SI, SIR and SIRI spreading models and with mild technical assumptions, the
Jordan center is the infection source associated with the most likely infection
path in a tree network with a single infection source. This conclusion applies
for a wide range of spreading parameters, while it holds for regular trees
under the SIS model with homogeneous infection and recovery rates. Since the
Jordan center does not depend on the infection, recovery and reinfection rates,
it can be regarded as a universal source estimator. We also consider the case
where there are k>1 infection sources, generalize the Jordan center definition
to a k-Jordan center set, and show that this is an optimal infection source set
estimator in a tree network for the SI model. Simulation results on various
general synthetic networks and real world networks suggest that Jordan
center-based estimators consistently outperform the betweenness, closeness,
distance, degree, eigenvector, and pagerank centrality based heuristics, even
if the network is not a tree
Infection Spreading and Source Identification: A Hide and Seek Game
The goal of an infection source node (e.g., a rumor or computer virus source)
in a network is to spread its infection to as many nodes as possible, while
remaining hidden from the network administrator. On the other hand, the network
administrator aims to identify the source node based on knowledge of which
nodes have been infected. We model the infection spreading and source
identification problem as a strategic game, where the infection source and the
network administrator are the two players. As the Jordan center estimator is a
minimax source estimator that has been shown to be robust in recent works, we
assume that the network administrator utilizes a source estimation strategy
that can probe any nodes within a given radius of the Jordan center. Given any
estimation strategy, we design a best-response infection strategy for the
source. Given any infection strategy, we design a best-response estimation
strategy for the network administrator. We derive conditions under which a Nash
equilibrium of the strategic game exists. Simulations in both synthetic and
real-world networks demonstrate that our proposed infection strategy infects
more nodes while maintaining the same safety margin between the true source
node and the Jordan center source estimator
On the Properties of Gromov Matrices and their Applications in Network Inference
The spanning tree heuristic is a commonly adopted procedure in network
inference and estimation. It allows one to generalize an inference method
developed for trees, which is usually based on a statistically rigorous
approach, to a heuristic procedure for general graphs by (usually randomly)
choosing a spanning tree in the graph to apply the approach developed for
trees. However, there are an intractable number of spanning trees in a dense
graph. In this paper, we represent a weighted tree with a matrix, which we call
a Gromov matrix. We propose a method that constructs a family of Gromov
matrices using convex combinations, which can be used for inference and
estimation instead of a randomly selected spanning tree. This procedure
increases the size of the candidate set and hence enhances the performance of
the classical spanning tree heuristic. On the other hand, our new scheme is
based on simple algebraic constructions using matrices, and hence is still
computationally tractable. We discuss some applications on network inference
and estimation to demonstrate the usefulness of the proposed method
Statistical methods for certain large, complex data challenges
Big data concerns large-volume, complex, growing data sets, and it provides us opportunities as well as challenges. This thesis focuses on statistical methods for several specific large, complex data challenges - each involving representation of data with complex format, utilization of complicated information, and/or intensive computational cost.
The first problem we work on is hypothesis testing for multilayer network data, motivated by an example in computational biology. We show how to represent the complex structure of a multilayer network as a single data point within the space of supra-Laplacians and then develop a central limit theorem and hypothesis testing theories for multilayer networks in that space. We develop both global and local testing strategies for mean comparison and investigate sample size requirements. The methods were applied to the motivating computational biology example and compared with the classic Gene Set Enrichment Analysis(GSEA). More biological insights are found in this comparison.
The second problem is the source detection problem in epidemiology, which is one of the most important issues for control of epidemics. Ideally, we want to locate the sources based on all history data. However, this is often infeasible, because the history data is complex, high-dimensional and cannot be fully observed. Epidemiologists have recognized the crucial role of human mobility as an important proxy to a complete history, but little in the literature to date uses this information for source detection. We recast the source detection problem as identifying a relevant mixture component in a multivariate Gaussian mixture model. Human mobility within a stochastic PDE model is used to calibrate the parameters. The capability of our method is demonstrated in the context of the 2000-2002 cholera outbreak in the KwaZulu-Natal province.
The third problem is about multivariate time series imputation, which is a classic problem in statistics. To address the common problem of low signal-to-noise ratio in high-dimensional multivariate time series, we propose models based on state-space models which provide more precise inference of missing values by clustering multivariate time series components in a nonparametric way. The models are suitable for large-scale time series due to their efficient parameter estimation.2019-05-15T00:00:00
From pathway to regulon in Arabidopsis
Combined bioinformatic approaches, using genomic and transcriptomic data, are applied to investigate the fatty acid biosynthesis pathway, at the molecular level, and in the context of the system biology of Arabidopsis. Fatty acids are essential components of all known bacterial and eukaryotic cells with critical role in cells as energy reserves and the metabolic precursors for biological membranes. The pathway for fatty acid synthesis seems to be conserved across all living systems. Acetyl-CoA carboxylase, a member of a superfamily of biotin-dependent enzymes, catalyzes the first committed step of the fatty acid biosynthesis pathway. Phylogenetic study exposed complex and intertwined evolutionary histories of this family, with multiple domain fusions and rearrangements. As revealed by meta-analysis of a wide array of Arabidopsis transcriptomic data, fatty acid biosynthesis is transcriptionally regulated, and this regulation not only extends across all pathway reactions, but also some substrate- and cofactor-producing reactions, thus defining a major transcriptionally co-regulated pathway. Meta-analysis of the transcriptome is extended to find groups of coexpressed genes (also called modules, or regulons) in the Arabidopsis genome. Major functionally-coherent gene groups were identified. These comprise development, information processing, defense, and metabolism, as well as tissue- and organelle-specific processes
Burmese pythons in Florida: A synthesis of biology, impacts, and management tools
Burmese pythons (Python molurus bivittatus) are native to southeastern Asia, however, there is an established invasive population inhabiting much of southern Florida throughout the Greater Everglades Ecosystem. Pythons have severely impacted native species and ecosystems in Florida and represent one of the most intractable invasive-species management issues across the globe. The difficulty stems from a unique combination of inaccessible habitat and the cryptic and resilient nature of pythons that thrive in the subtropical environment of southern Florida, rendering them extremely challenging to detect. Here we provide a comprehensive review and synthesis of the science relevant to managing invasive Burmese pythons. We describe existing control tools and review challenges to productive research, identifying key knowledge gaps that would improve future research and decision making for python control. (119 pp
Genomics and epidemiology of SARS-CoV-2 in Brazil
As of the 24th January 2021, it is estimated that the coronavirus disease 2019 (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has led to over 350 million reported cases and over 5.6 million deaths worldwide. Brazil has the third highest case count, over 24 million, and the second highest death count, over 623,000. In this thesis, I apply genomic and epidemiological approaches to describe and understand SARS-CoV-2 importation, transmission, spread, evolution and response during the first year of the COVID-19 pandemic in Brazil.
Chapter 2 provides and overview of the early importation, spread and response. I start by identifying the probable air routes for SARS-CoV-2 importation into Brazil. I also provide a description of the first SARS-CoV-2 cases reported in Latin America, followed by epidemiological estimates of the basic reproduction number for the most affected Brazilian states. This chapter ends with a description of the implementation and easing of non-pharmaceutical interventions (NPIs) in 72.3% of the Brazilian municipalities.
In Chapter 3, I couple genomic insights obtained from a novel representative dataset of 427 SARS-CoV-2 genomes from Brazil with human mobility data to describe SARSCoV-2 importation and genomic diversity, reconstruct SARS-CoV-2 nationwide spatial spread and investigate the impact of NPIs implemented in Brazil.
Chapter 4 covers the application of genomic epidemiology approaches to the identification and description of new SARS-CoV-2 variants of concern (VOCs). I describe the first two cases of the Alpha VOC in Brazil and provide a genomic characterization of the first cases of the Gamma VOC in Manaus, north Brazil.
Finally, I apply epidemiological and genomic approaches to uncover the dynamics of hospital-associated transmission in the largest hospital complex in Latin America. Chapter 5 shows evidence for SARS-CoV-2 within-hospital transmission to be higher in non-COVID-19 hospitals
Crop Disease Detection Using Remote Sensing Image Analysis
Pest and crop disease threats are often estimated by complex changes in crops and the applied agricultural practices that result mainly from the increasing food demand and climate change at global level. In an attempt to explore high-end and sustainable solutions for both pest and crop disease management, remote sensing technologies have been employed, taking advantages of possible changes deriving from relative alterations in the metabolic activity of infected crops which in turn are highly associated to crop spectral reflectance properties. Recent developments applied to high resolution data acquired with remote sensing tools, offer an additional tool which is the opportunity of mapping the infected field areas in the form of patchy land areas or those areas that are susceptible to diseases. This makes easier the discrimination between healthy and diseased crops, providing an additional tool to crop monitoring. The current book brings together recent research work comprising of innovative applications that involve novel remote sensing approaches and their applications oriented to crop disease detection. The book provides an in-depth view of the developments in remote sensing and explores its potential to assess health status in crops
- …