435 research outputs found
Biogeographic Study of Human Gut-Associated CrAssphage Suggests Impacts From Industrialization and Recent Expansion
CrAssphage (cross-assembly phage) is a bacteriophage that was first discovered in human gut metagenomic data. CrAssphage belongs to a diverse family of crAss-like bacteriophages thought to infect gut commensal bacteria belonging to Bacteroides species. However, not much is known about the biogeography of crAssphage and whether certain strains are associated with specific human populations. In this study, we screened publicly available human gut metagenomic data from 3,341 samples for the presence of crAssphage sensu stricto (NC_024711.1). We found that crAssphage prevalence is low in traditional, hunter-gatherer populations, such as the Hadza from Tanzania and Matses from Peru, as compared to industrialized, urban populations. Statistical comparisons showed no association of crAssphage prevalence with variables such as age, sex, body mass index, and health status of individuals. Phylogenetic analyses show that crAssphage strains reconstructed from the same individual over multiple time-points, cluster together. CrAssphage strains from individuals from the same study population do not always cluster together. Some evidence of clustering is seen at the level of broadly defined geographic regions, however, the relative positions of these clusters within the crAssphage phylogeny are not well-supported. We hypothesize that this lack of strong biogeographic structuring is suggestive of an expansion event within crAssphage. Using a Bayesian dating approach, we estimate that this expansion has occurred fairly recently. Overall, we determine that crAssphage presence is associated with an industrialized lifestyle and the absence of strong biogeographic structuring within global crAssphage strains is likely due to a recent population expansion within this bacteriophage.This study was supported by a grant from the National Institutes of Health (https://www.nih.gov/), NIH R01 GM089886, awarded to C.M.L., C.W., and K.S. Open Access fees paid for in whole or in part by the University of Oklahoma Libraries.Ye
Advances in knowledge discovery and data mining Part II
19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p
Estimating the rate of intersubtype recombination in early HIV-1 group M strains
West Central Africa has been implicated as the epicenter of the HIV-1 epidemic, and almost all group M subtypes can be found there. Previous analysis of early HIV-1 group M sequences from Kinshasa in the Democratic Republic of Congo, formerly Zaire, revealed that isolates from a number of individuals fall in different positions in phylogenetic trees constructed from sequences from opposite ends of the genome as a result of recombination between viruses of different subtypes. Here, we use discrete ancestral trait mapping to develop a procedure for quantifying HIV-1 group M intersubtype recombination across phylogenies, using individuals' gag (p17) and env (gp41) subtypes. The method was applied to previously described HIV-1 group M sequences from samples obtained in Kinshasa early in the global radiation of HIV. Nine different p17 and gp41 intersubtype recombinant combinations were present in the data set. The mean number of excess ancestral subtype transitions (NEST) required to map individuals' p17 subtypes onto the gp14 phylogeny samples, compared to the number required to map them onto the p17 phylogenies, and vice versa, indicated that excess subtype transitions occurred at a rate of approximately 7 × 10(−3) to 8 × 10(−3) per lineage per year as a result of intersubtype recombination. Our results imply that intersubtype recombination may have occurred in approximately 20% of lineages evolving over a period of 30 years and confirm intersubtype recombination as a substantial force in generating HIV-1 group M diversity
Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations
The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov
Phylodynamic modelling of foot-and-mouth disease virus sequence data
The under-reporting of cases of infectious diseases is a substantial impediment to the control and management of infectious diseases in both epidemic and endemic contexts. Information about infectious disease dynamics can be recovered from sequence data using time-varying coalescent approaches, and phylodynamic models have been developed in order to reconstruct demographic changes of the numbers of infected hosts through time. In this study I have demonstrated the general concordance between empirically observed epidemiological incidence data and viral demography inferred through analysis of foot-and-mouth disease virus VP1 coding sequences belonging to the CATHAY topotype over large temporal and spatial scales. However a more precise and robust relationship between the effective population size
Evolutionary analysis of rapidly evolving RNA viruses
Recent advances in sequencing technology and computing power mean that we are in
an unprecedented position to analyse large viral sequence datasets using state-of-the-art
methods, with the aim of better understanding pathogen evolution and
epidemiology. This thesis concerns the evolutionary analysis of rapidly evolving
RNA viruses, with a focus on avian influenza and the use of Bayesian methodologies
which account for uncertainty in the evolutionary process. As avian influenza
viruses present an epidemiological and economic threat on a global scale, knowledge
of how they are circulating and evolving is of substantial public health importance.
In the first part of this thesis I consider avian influenza viruses of haemagglutinin
(HA) subtype H7 which, along with H5, is the only subtype for which highly
pathogenic influenza has been found. I conduct a comprehensive phylogenetic
analysis of available H7 HA sequences to reveal global evolutionary relationships,
which can help to target influenza surveillance in birds and facilitate the early
detection of potential pandemic strains. I provide evidence for the continued
distinction between American and Eurasian sequences, and suggest that the most
likely route for the introduction of highly pathogenic H5N1 avian influenza to North
America would be through the smuggling of caged birds.
I proceed to apply novel methods to better understand the evolution of avian
influenza. Firstly, I use an extension of stochastic mutational mapping methods to
estimate the dN/dS ratio of H7 HA on different neuraminidase (NA) subtype
backgrounds. I find dN/dS to be higher on the N2 NA background than on N1, N3 or
N7 NA backgrounds, due to differences in selective pressure. Secondly, I investigate
reassortment, which generates novel influenza strains and precedes human influenza
pandemics. The rate at which reassortment occurs has been difficult to assess, and I
take a novel approach to quantifying reassortment across phylogenies using discrete
trait mapping methods. I also use discrete trait mapping to investigate inter-subtype
recombination in early HIV-1 in Kinshasa, the epicentre of the HIV-1 group M
epidemic. In the final section of the thesis, I describe a method whereby
epidemiological parameters may be inferred from viral sequence data isolated from
different infected individuals in a population. To conclude, I discuss the findings of
this thesis in the context of other evolutionary and epidemiological studies, suggest
future directions for avian influenza research and highlight scenarios in which the
methods described in this thesis might find further application
Recommended from our members
Complex Query Operators on Modern Parallel Architectures
Identifying interesting objects from a large data collection is a fundamental problem for multi-criteria decision making applications.In Relational Database Management Systems (RDBMS), the most popular complex query operators used to solve this type of problem are the Top-K selection operator and the Skyline operator.Top-K selection is tasked with retrieving the k-highest ranking tuples from a given relation, as determined by a user-defined aggregation function.Skyline selection retrieves those tuples with attributes offering (pareto) optimal trade-offs in a given relation.Efficient Top-K query processing entails minimizing tuple evaluations by utilizing elaborate processing schemes combined with sophisticated data structures that enable early termination.Skyline query evaluation involves supporting processing strategies which are geared towards early termination and incomparable tuple pruning.The rapid increase in memory capacity and decreasing costs have been the main drivers behind the development of main-memory database systems.Although the act of migrating query processing in-memory has created many opportunities to improve the associated query latency, attaining such improvements has been very challenging due to the growing gap between processor and main memory speeds.Addressing this limitation has been made easier by the rapid proliferation of multi-core and many-core architectures.However, their utilization in real systems has been hindered by the lack of suitable parallel algorithms that focus on algorithmic efficiency.In this thesis, we study in depth the Top-K and Skyline selection operators, in the context of emerging parallel architectures.Our ultimate goal is to provide practical guidelines for developing work-efficient algorithms suitable for parallel main memory processing.We concentrate on multi-core (CPU), many-core (GPU), and processing-in-memory architectures (PIM), developing solutions optimized for high throughout and low latency.The first part of this thesis focuses on Top-K selection, presenting the specific details of early termination algorithms that we developed specifically for parallel architectures and various types of accelerators (i.e. GPU, PIM).The second part of this thesis, concentrates on Skyline selection and the development of a massively parallel load balanced algorithm for PIM architectures.Our work consolidates performance results across different parallel architectures using synthetic and real data on variable query parameters and distributions for both of the aforementioned problems.The experimental results demonstrate several orders of magnitude better throughput and query latency, thus validating the effectiveness of our proposed solutions for the Top-K and Skyline selection operators
Unsupervised Intrusion Detection with Cross-Domain Artificial Intelligence Methods
Cybercrime is a major concern for corporations, business owners, governments and citizens, and it continues to grow in spite of increasing investments in security and fraud prevention. The main challenges in this research field are: being able to detect unknown attacks, and reducing the false positive ratio. The aim of this research work was to target both problems by leveraging four artificial intelligence techniques.
The first technique is a novel unsupervised learning method based on skip-gram modeling. It was designed, developed and tested against a public dataset with popular intrusion patterns. A high accuracy and a low false positive rate were achieved without prior knowledge of attack patterns.
The second technique is a novel unsupervised learning method based on topic modeling. It was applied to three related domains (network attacks, payments fraud, IoT malware traffic). A high accuracy was achieved in the three scenarios, even though the malicious activity significantly differs from one domain to the other.
The third technique is a novel unsupervised learning method based on deep autoencoders, with feature selection performed by a supervised method, random forest. Obtained results showed that this technique can outperform other similar techniques.
The fourth technique is based on an MLP neural network, and is applied to alert reduction in fraud prevention. This method automates manual reviews previously done by human experts, without significantly impacting accuracy
- …