142 research outputs found

    Inference of Markovian Properties of Molecular Sequences from NGS Data and Applications to Comparative Genomics

    Full text link
    Next Generation Sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this approach, the underlying expected word counts are essential. A plausible model for this underlying distribution of word counts is given through modelling the DNA sequence as a Markov chain (MC). For single long sequences, efficient statistics are available to estimate the order of MCs and the transition probability matrix for the sequences. As NGS data do not provide a single long sequence, inference methods on Markovian properties of sequences based on single long sequences cannot be directly used for NGS short read data. Here we derive a normal approximation for such word counts. We also show that the traditional Chi-square statistic has an approximate gamma distribution, using the Lander-Waterman model for physical mapping. We propose several methods to estimate the order of the MC based on NGS reads and evaluate them using simulations. We illustrate the applications of our results by clustering genomic sequences of several vertebrate and tree species based on NGS reads using alignment-free sequence dissimilarity measures. We find that the estimated order of the MC has a considerable effect on the clustering results, and that the clustering results that use a MC of the estimated order give a plausible clustering of the species.Comment: accepted by RECOMB-SEQ 201

    Variations in mineralogy, temperature, and oxygen fugacity in a suite of strongly peralkaline lavas and tuffs, Pantelleria, Italy.

    Get PDF
    Eight samples of pantelleritic lava and tuff and a lithic inclusion of trachyte from Pantelleria, Italy, have been thoroughly analyzed with an electron microprobe. These samples reveal fi ve different mineral assemblages if classifi ed by the presence of fayalite, aenigmatite, ilmenite, and magnetite: (1) augite + fayalite + ilmenite + magnetite, (2) augite + fayalite + ilmenite, (3) hedenbergite or sodian hedenbergite + fayalite + ilmenite + aenigmatite + quartz, (4) sodian hedenbergite or aegirine-augite + ilmenite + aenigmatite + quartz ± ferrorichterite, and (5) aegirine-augite + aenigmatite + quartz. Alkali feldspar (Or35–37) is present as the dominant phyric phase in each assemblage. Whole-rock silica and peralkalinity correlate strongly with the mineral assemblage: assemblage 1 is found in the sample with the lowest agpaitic index [A.I. = molar (Na + K)/Al] and silica concentration (A.I. \u3c 1.31, SiO2 \u3c 64.8 wt%) and equilibrated at 991–888°C at an oxygen fugacity between 0.7 and 1.1 log units below the FMQ buffer (FMQ – 0.7 to FMQ – 1.1). Assemblage 2 is associated with a higher agpaitic index and silica concentration (A.I. = 1.42, SiO2 = 67.1%) and equilibrated at ~794°C at FMQ – 0.5. Assemblage 3 is associated with a still higher agpaitic index and silica concentration (A.I. in the range 1.55 – 1.63, 66.8 \u3c SiO2 \u3c 67.8%) and equilibrated at 764–756°C at FMQ – 0.5 to FMQ – 0.2. Assemblage 4 is associated with a slightly higher agpaitic index and yet higher silica concentration (1.61 \u3c A.I \u3c 1.75, 67.6 \u3c SiO2 \u3c 72.0%) and equilibrated between 740–700°C at oxygen fugacities at or just below the FMQ buffer. Assemblage 5 is associated with the highest agpaitic index and highest concentration of silica (A.I. = 1.97, SiO2 = 69.7%) and equilibrated at \u3c700°C at an oxygen fugacity just above the FMQ buffer in a “no-oxide” fi eld. Despite the paucity of two-oxide, two-pyroxene, or two-feldspar pairs, it may be possible to accurately constrain temperature and oxygen fugacity in peralkaline rocks with QUIlF equilibria given an equilibrium assemblage of fayalite, ilmenite, and clinopyroxene

    Comparison of metagenomic samples using sequence signatures

    Get PDF
    BACKGROUND: Sequence signatures, as defined by the frequencies of k-tuples (or k-mers, k-grams), have been used extensively to compare genomic sequences of individual organisms, to identify cis-regulatory modules, and to study the evolution of regulatory sequences. Recently many next-generation sequencing (NGS) read data sets of metagenomic samples from a variety of different environments have been generated. The assembly of these reads can be difficult and analysis methods based on mapping reads to genes or pathways are also restricted by the availability and completeness of existing databases. Sequence-signature-based methods, however, do not need the complete genomes or existing databases and thus, can potentially be very useful for the comparison of metagenomic samples using NGS read data. Still, the applications of sequence signature methods for the comparison of metagenomic samples have not been well studied. RESULTS: We studied several dissimilarity measures, including d(2), d(2)(*) and d(2)(S) recently developed from our group, a measure (hereinafter noted as Hao) used in CVTree developed from Hao’s group (Qi et al., 2004), measures based on relative di-, tri-, and tetra-nucleotide frequencies as in Willner et al. (2009), as well as standard l(p) measures between the frequency vectors, for the comparison of metagenomic samples using sequence signatures. We compared their performance using a series of extensive simulations and three real next-generation sequencing (NGS) metagenomic datasets: 39 fecal samples from 33 mammalian host species, 56 marine samples across the world, and 13 fecal samples from human individuals. Results showed that the dissimilarity measure d(2)(S) can achieve superior performance when comparing metagenomic samples by clustering them into different groups as well as recovering environmental gradients affecting microbial samples. New insights into the environmental factors affecting microbial compositions in metagenomic samples are obtained through the analyses. Our results show that sequence signatures of the mammalian gut are closely associated with diet and gut physiology of the mammals, and that sequence signatures of marine communities are closely related to location and temperature. CONCLUSIONS: Sequence signatures can successfully reveal major group and gradient relationships among metagenomic samples from NGS reads without alignment to reference databases. The d(2)(S) dissimilarity measure is a good choice in all application scenarios. The optimal choice of tuple size depends on sequencing depth, but it is quite robust within a range of choices for moderate sequencing depths

    Developing a class of dual atom materials for multifunctional catalytic reactions

    Get PDF
    Dual atom catalysts, bridging single atom and metal/alloy nanoparticle catalysts, offer more opportunities to enhance the kinetics and multifunctional performance of oxygen reduction/evolution and hydrogen evolution reactions. However, the rational design of efficient multifunctional dual atom catalysts remains a blind area and is challenging. In this study, we achieved controllable regulation from Co nanoparticles to CoN4 single atoms to Co2N5 dual atoms using an atomization and sintering strategy via an N-stripping and thermal-migrating process. More importantly, this strategy could be extended to the fabrication of 22 distinct dual atom catalysts. In particular, the Co2N5 dual atom with tailored spin states could achieve ideally balanced adsorption/desorption of intermediates, thus realizing superior multifunctional activity. In addition, it endows Zn-air batteries with long-term stability for 800 h, allows water splitting to continuously operate for 1000 h, and can enable solar-powered water splitting systems with uninterrupted large-scale hydrogen production throughout day and night. This universal and scalable strategy provides opportunities for the controlled design of efficient multifunctional dual atom catalysts in energy conversion technologies

    Gut microbial composition changes in bladder cancer patients: A case-control study in Harbin, China

    Get PDF
    BACKGROUND AND OBJECTIVES: This study aimed to explore the changes of gut bacteria in bladder cancer patients. METHODS AND STUDY DESIGN: Newly diagnosed bladder cancer patients were recruited. All participants completed a questionnaire about personal behavior and diet. Pyrosequencing of the total genomic DNA extracted from human feces was carried out by Illumina HiSeq 2000. The copy number of target DNA for bacteria was determined by real-time quantitative PCR assay. Fecal short chain fatty acids contents were measured by gas chromatography (GC) analysis. The concentrations of lipopolysaccharide and D-lactic acid in serum were determined by enzyme-linked immunosorbent assay kits. RESULTS: Fruit intake was significantly lower than in healthy controls. The numbers of Clostridium cluster XI and Prevotella in bladder cancer patients decreased. The numbers of domain bacteria and Prevotella were significantly and positively associated with fruit intake (r=0.002, p<0.05 for domain bacteria; r=0.004, p<0.05 for Prevotella). The concentration of butyric acid decreased significantly in bladder cancer patients, and the quantities of fecal butyric acid were significantly and positively associated with fruit intake (r=0.610, p<0.01). The concentrations of lipopolysaccharide and D-lactic acid, two sensitive markers of gut permeability, were greater in bladder cancer patients. CONCLUSIONS: Dysbiosis of gut microbiota, decreased butyric acid concentrations and impaired intestinal structural integrity were found in bladder cancer patients, which might be associated with inadequate fruit intake

    Sarcopenia Was a Poor Prognostic Predictor for Patients With Advanced Lung Cancer Treated With Immune Checkpoint Inhibitors

    Get PDF
    BackgroundIt remains not well known whether skeletal muscle mass (SMM) loss has any impact on the effectiveness of immune checkpoint inhibitors (ICIs) in patients with advanced lung cancer. We aimed to evaluate the association between SMM and clinical outcome of patients with advanced lung cancer receiving ICIs as first line or second line.Materials and MethodsFrom March 1st, 2019 to March 31st, 2021 at our hospital, 34 patients with advanced lung cancer treated with first-line or second-line ICIs were enrolled retrospectively. The estimation of skeletal muscle index (SMI) for sarcopenia was assessed at the level of the third lumbar vertebra (L3) on computed tomography (CT) images obtained within 4 weeks before initiation of ICIs treatment. The impact of sarcopenia (low SMI) on progression free survival (PFS) was analyzed using Kaplan-Meier method and log-rank tests. The effect of various variables on PFS was evaluated using Cox proportional hazards regression model with univariate and multivariate analysis. The impact on treatment response including objective response rate (ORR) and disease control rate (DCR) and immunotherapy related adverse events (irAEs) between patients with and without sarcopenia was compared by the chi-squared test. The comparison of SMI value between patients with objective response (OR), disease control (DC) and those without OR and DC was used student t-test or Mann-Whitney U test.ResultsBoth in univariate and multivariate analysis, sarcopenia and treatment lines were the predictive factors for PFS (p &lt; 0.05). Patients with sarcopenia had significantly shorter PFS than that of non-sarcopenic ones [6.57 vs. 16.2 months, hazard ratios (HR) = 2.947 and 3.542, and 95% confidence interval (CI): 1.123–13.183 and 1.11–11.308, p = 0.022 and 0.033]. No significant difference in ORR and irAEs was found. Patients with sarcopenia had lower DCR than those without sarcopenia. The mean SMI value of DCR group and non-DCR group was 32.94 ± 5.49 and 44.77 ± 9.06 cm2/m2, respectively (p = 0.008).ConclusionSarcopenia before immunotherapy might be a significant predictor for poor prognosis including shorter PFS and lower DCR in patients with advanced lung cancer treated with ICIs as first line or second line

    Atud Gabbro-Diorite Complex: Glimpse of the Cryogenian Mixing, Assimilation, Storage, and Homogenization Zone beneath the Eastern Desert of Egypt

    Get PDF
    We analysed gabbroic and dioritic rocks from the Atud igneous complex in the Eastern Desert of Egypt to understand better the formation of juvenile continental crust of the Arabian–Nubian Shield. Our results show that the rocks are the same age (U–Pb zircon ages of 694.5 ± 2.1 Ma for two diorites and 695.3 ± 3.4 Ma for one gabbronorite). These are partial melts of the mantle and related fractionates (εNd₆₉₀ = +4.2 to +7.3, ⁸⁷Sr/⁸⁶Sr_i = 0.70246–0.70268, zircon δ¹⁸O ∼ +5‰). Trace element patterns indicate that Atud magmas formed above a subduction zone as part of a large and long-lived (c. 60 myr) convergent margin. Atud complex igneous rocks belong to a larger metagabbro–epidiorite–diorite complex that formed as a deep crustal mush into which new pulses of mafic magma were periodically emplaced, incorporated and evolved. The petrological evolution can be explained by fractional crystallization of mafic magma plus variable plagioclase accumulation in a mid- to lower crustal MASH zone. The Atud igneous complex shows that mantle partial melting and fractional crystallization and plagioclase accumulation were important for Cryogenian crust formation in this part of the Arabian–Nubian Shield

    Geodynamic Evolution of a Forearc Rift in the Southernmost Mariana Arc

    Get PDF
    The southernmost Mariana forearc stretched to accommodate opening of the Mariana Trough backarc basin in late Neogene time, erupting basalts now exposed in the SE Mariana Forearc Rift (SEMFR) 3.7 – 2.7 Ma ago. Today, SEMFR is a broad zone of extension that formed on hydrated, forearc lithosphere and overlies the shallow subducting slab (slab depth ≤ 30 – 50 km). It comprises NW-SE trending subparallel deeps, 3 - 16 km wide, that can be traced ≥ ~ 30 km from the trench almost to the backarc spreading center, the Malaguana-Gadao Ridge (MGR). While forearcs are usually underlain by serpentinized harzburgites too cold to melt, SEMFR crust is mostly composed of Pliocene, low-K basaltic to basaltic andesite lavas that are compositionally similar to arc lavas and backarc basin (BAB) lavas, and thus defines a forearc region that recently witnessed abundant igneous activity in the form of seafloor spreading. SEMFR igneous rocks have low Na8, Ti8, and Fe8, consistent with extensive melting, at ~ 23 ± 6.6 km depth and 1239 ± 40oC, by adiabatic decompression of depleted asthenospheric mantle metasomatized by slab-derived fluids. Stretching of pre-existing forearc lithosphere allowed BAB-like mantle to flow along SEMFR and melt, forming new oceanic crust. Melts interacted with preexisting forearc lithosphere during ascent. SEMFR is no longer magmatically active and post-magmatic tectonic activity dominates the rift
    corecore