25,583 research outputs found

    Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study

    Get PDF
    Background: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are many available methods to infer phylogenies, and these have various advantages and disadvantages, but few unbiased comparisons of the range of approaches have been made. Methods: We simulated data from a defined "true tree" using a realistic evolutionary model. We built phylogenies from this data using a range of methods, and compared reconstructed trees to the true tree using two measures, noting the computational time needed for different phylogenetic reconstructions. We also used real data from Streptococcus pneumoniae alignments to compare individual core gene trees to a core genome tree. Results: We found that, as expected, maximum likelihood trees from good quality alignments were the most accurate, but also the most computationally intensive. Using less accurate phylogenetic reconstruction methods, we were able to obtain results of comparable accuracy; we found that approximate results can rapidly be obtained using genetic distance based methods. In real data we found that highly conserved core genes, such as those involved in translation, gave an inaccurate tree topology, whereas genes involved in recombination events gave inaccurate branch lengths. We also show a tree-of-trees, relating the results of different phylogenetic reconstructions to each other. Conclusions: We recommend three approaches, depending on requirements for accuracy and computational time. Quicker approaches that do not perform full maximum likelihood optimisation may be useful for many analyses requiring a phylogeny, as generating a high quality input alignment is likely to be the major limiting factor of accurate tree topology. We have publicly released our simulated data and code to enable further comparisons

    Healthcare-associated outbreak of meticillin-resistant Staphylococcus aureus bacteraemia: role of a cryptic variant of an epidemic clone

    Get PDF
    BACKGROUND New strains of meticillin-resistant Staphylococcus aureus (MRSA) may be associated with changes in rates of disease or clinical presentation. Conventional typing techniques may not detect new clonal variants that underlie changes in epidemiology or clinical phenotype. AIM To investigate the role of clonal variants of MRSA in an outbreak of MRSA bacteraemia at a hospital in England. METHODS Bacteraemia isolates of the major UK lineages (EMRSA-15 and -16) from before and after the outbreak were analysed by whole-genome sequencing in the context of epidemiological and clinical data. For comparison, EMRSA-15 and -16 isolates from another hospital in England were sequenced. A clonal variant of EMRSA-16 was identified at the outbreak hospital and a molecular signature test designed to distinguish variant isolates among further EMRSA-16 strains. FINDINGS By whole-genome sequencing, EMRSA-16 isolates during the outbreak showed strikingly low genetic diversity (P < 1 × 10(-6), Monte Carlo test), compared with EMRSA-15 and EMRSA-16 isolates from before the outbreak or the comparator hospital, demonstrating the emergence of a clonal variant. The variant was indistinguishable from the ancestral strain by conventional typing. This clonal variant accounted for 64/72 (89%) of EMRSA-16 bacteraemia isolates at the outbreak hospital from 2006. CONCLUSIONS Evolutionary changes in epidemic MRSA strains not detected by conventional typing may be associated with changes in disease epidemiology. Rapid and affordable technologies for whole-genome sequencing are becoming available with the potential to identify and track the emergence of variants of highly clonal organisms

    Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images

    Get PDF
    Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL maps are derived through computational staining using a convolutional neural network trained to classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and correlation with overall survival. TIL map structural patterns were grouped using standard histopathological parameters. These patterns are enriched in particular T cell subpopulations derived from molecular measures. TIL densities and spatial structure were differentially enriched among tumor types, immune subtypes, and tumor molecular subtypes, implying that spatial infiltrate state could reflect particular tumor cell aberration states. Obtaining spatial lymphocytic patterns linked to the rich genomic characterization of TCGA samples demonstrates one use for the TCGA image archives with insights into the tumor-immune microenvironment

    Regionalization of landscape pattern indices using multivariate cluster analysis

    Get PDF
    This project was funded by the Government of Canada through the Mountain Pine Beetle Program, a six-year, $40 million program administered by Natural Resources Canada, Canadian Forest Service. Additional information on the Mountain Pine Beetle Program may be found at: http://mpb.cfs.nrcan.gc.ca.Regionalization, or the grouping of objects in space, is a useful tool for organizing, visualizing, and synthesizing the information contained in multivariate spatial data. Landscape pattern indices can be used to quantify the spatial pattern (composition and configuration) of land cover features. Observable patterns can be linked to underlying processes affecting the generation of landscape patterns (e.g., forest harvesting). The objective of this research is to develop an approach for investigating the spatial distribution of forest pattern across a study area where forest harvesting, other anthropogenic activities, and topography, are all influencing forest pattern. We generate spatial pattern regions (SPR) that describe forest pattern with a regionalization approach. Analysis is performed using a 2006 land cover dataset covering the Prince George and Quesnel Forest Districts, 5.5 million ha of primarily forested land base situated within the interior plateau of British Columbia, Canada. Multivariate cluster analysis (with the CLARA algorithm) is used to group landscape objects containing forest pattern information into SPR. Of the six generated SPR, the second cluster (SPR2) is the most prevalent covering 22% of the study area. On average, landscapes in SPR2 are comprised of 55.5% forest cover, and contain the highest number of patches, and forest/non-forest joins, indicating highly fragmented landscapes. Regionalization of landscape pattern metrics provides a useful approach for examining the spatial distribution of forest pattern. Where forest patterns are associated with positive or negative environmental conditions, SPR can be used to identify similar regions for conservation or management activities.PostprintPeer reviewe

    ALMA and VLA Observations: Evidence for Ongoing Low-mass Star Formation near Sgr A*

    Get PDF
    Using the VLA, we recently detected a large number of protoplanetary disk (proplyd) candidates lying within a couple of light years of the massive black hole Sgr A*. The bow-shock appearance of proplyd candidates point toward the young massive stars located near Sgr A*. Similar to Orion proplyds, the strong UV radiation from the cluster of massive stars at the Galactic center is expected to photoevaporate and photoionize the circumstellar disks around young, low mass stars, thus allowing detection of the ionized outflows from the photoionized layer surrounding cool and dense gaseous disks. To confirm this picture, ALMA observations detect millimeter emission at 226 GHz from five proplyd candidates that had been detected at 44 and 34 GHz with the VLA. We present the derived disk masses for four sources as a function of the assumed dust temperature. The mass of protoplanetary disks from cool dust emission ranges between 0.03 -- 0.05 solar mass. These estimates are consistent with the disk masses found in star forming sites in the Galaxy. These measurements show the presence of on-going star formation with the implication that gas clouds can survive near Sgr A* and the relative importance of high vs low-mass star formation in the strong tidal and radiation fields of the Galactic center.Comment: 13 pages, 3 figures, MNRAS (in press

    Link-time smart card code hardening

    Get PDF
    This paper presents a feasibility study to protect smart card software against fault-injection attacks by means of link-time code rewriting. This approach avoids the drawbacks of source code hardening, avoids the need for manual assembly writing, and is applicable in conjunction with closed third-party compilers. We implemented a range of cookbook code hardening recipes in a prototype link-time rewriter and evaluate their coverage and associated overhead to conclude that this approach is promising. We demonstrate that the overhead of using an automated link-time approach is not significantly higher than what can be obtained with compile-time hardening or with manual hardening of compiler-generated assembly code

    Urban Authorities and Economic Sectors

    Get PDF
    Taxation on businesses, which makes up two thirds of local authorities' revenues, urges them to widen the range of their economic interventions. The communauté d'agglomération Plaine Commune is one of the most dynamic in the Parisian metropolis and has had considerable expertise at their disposal for 10 years. In order to ensure their economic influence, they now endeavour to stimulate and even structure some economic industries. Unfortunately, a local tax reform in progress may well jeopardise the pattern of development which is presented in this article

    Forgetting Exceptions is Harmful in Language Learning

    Get PDF
    We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.Comment: 31 pages, 7 figures, 10 tables. uses 11pt, fullname, a4wide tex styles. Pre-print version of article to appear in Machine Learning 11:1-3, Special Issue on Natural Language Learning. Figures on page 22 slightly compressed to avoid page overloa
    corecore