456 research outputs found

    Bacterial genomic G + C composition-eliciting environmental adaptation

    Get PDF
    Bacterial genomes reflect their adaptation strategies through nucleotide usage trends found in their chromosome composition. Bacteria, unlike eukaryotes contain a wide range of genomic G + C. This wide variability may be viewed as a response to environmental adaptation. Two overarching trends are observed across bacterial genomes, the first, correlates genomic G + C to environmental niches and lifestyle, while the other utilizees intra-genomic G + C incongruence to delineate horizontally transferred material. In this review, we focus on the influence of several properties including biochemical, genetic flows, selection biases, and the biochemical-energetic properties shaping genome composition. Outcomes indicate a trend toward high G + C and larger genomes in free-living organisms, as a result of more complex and varied environments (higher chance for horizontal gene transfer). Conversely, nutrient limiting and nutrient poor environments dictate smaller genomes of low GC in attempts to conserve replication expense. Varied processes including translesion repair mechanisms, phage insertion and cytosine degradation has been shown to introduce higher AT in genomic sequences. We conclude the review with an analysis of current bioinformatics tools seeking to elicit compositional variances and highlight the practical implications when using such techniques

    Insights into bacterial genome composition through variable target GC content profiling

    Full text link
    This study presents a new computational method for guanine (G) and cytosine (C), or GC, content profiling based on the idea of multiple resolution sampling (MRS). The benefit of our new approach over existing techniques follows from its ability to locate significant regions without prior knowledge of the sequence, nor the features being sought. The use of MRS has provided novel insights into bacterial genome composition. Key findings include those that are related to the core composition of bacterial genomes, to the identification of large genomic islands (in Enterobacterial genomes), and to the identification of surface protein determinants in human pathogenic organisms (e.g., Staphylococcus genomes). We observed that bacterial surface binding proteins maintain abnormal GC content, potentially pointing to a viral origin. This study has demonstrated that GC content holds a high informational worth and hints at many underlying evolutionary processes. For online Supplementary Material, see www.liebertonline.com

    Summary Statistics for Partitionings and Feature Allocations

    Full text link
    Infinite mixture models are commonly used for clustering. One can sample from the posterior of mixture assignments by Monte Carlo methods or find its maximum a posteriori solution by optimization. However, in some problems the posterior is diffuse and it is hard to interpret the sampled partitionings. In this paper, we introduce novel statistics based on block sizes for representing sample sets of partitionings and feature allocations. We develop an element-based definition of entropy to quantify segmentation among their elements. Then we propose a simple algorithm called entropy agglomeration (EA) to summarize and visualize this information. Experiments on various infinite mixture posteriors as well as a feature allocation dataset demonstrate that the proposed statistics are useful in practice.Comment: Accepted to NIPS 2013: https://nips.cc/Conferences/2013/Program/event.php?ID=376

    Cluster Data Analysis with a Fuzzy Equivalence Relation to Substantiate a Medical Diagnosis

    Get PDF
    This study aims to develop a methodology for the justification of medical diagnostic decisions based on the clustering of large volumes of statistical information stored in decision support systems. This aim is relevant since the analyzed medical data are often incomplete and inaccurate, negatively affecting the correctness of medical diagnosis and the subsequent choice of the most effective treatment actions. Clustering is an effective mathematical tool for selecting useful information under conditions of initial data uncertainty. The analysis showed that the most appropriate algorithm to solve the problem is based on fuzzy clustering and fuzzy equivalence relation. The methods of the present study are based on the use of this algorithm forming the technique of analyzing large volumes of medical data due to prepare a rationale for making medical diagnostic decisions. The proposed methodology involves the sequential implementation of the following procedures: preliminary data preparation, selecting the purpose of cluster data analysis, determining the form of results presentation, data normalization, selection of criteria for assessing the quality of the solution, application of fuzzy data clustering, evaluation of the sample, results and their use in further work. Fuzzy clustering quality evaluation criteria include partition coefficient, entropy separation criterion, separation efficiency ratio, and cluster power criterion. The novelty of the results of this article is related to the fact that the proposed methodology makes it possible to work with clusters of arbitrary shape and missing centers, which is impossible when using universal algorithms. Doi: 10.28991/esj-2021-01305 Full Text: PD
    • …
    corecore