247 research outputs found

    Marine protist diversity in European coastal waters and sediments as revealed by high-throughput sequencing

    Get PDF
    International audienceAlthough protists are critical components of marine ecosystems, they are still poorly characterized. Here we analysed the taxonomic diversity of planktonic and benthic protist communities collected in six distant European coastal sites. Environmental deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) from three size fractions (pico-, nano- and micro/mesoplankton), as well as from dissolved DNA and surface sediments were used as templates for tag pyrosequencing of the V4 region of the 18S ribosomal DNA. Beta-diversity analyses split the protist community structure into three main clusters: picoplankton-nanoplankton-dissolved DNA, micro/mesoplankton and sediments. Within each cluster, protist communities from the same site and time clustered together, while communities from the same site but different seasons were unrelated. Both DNA and RNA-based surveys provided similar relative abundances for most class-level taxonomic groups. Yet, particular groups were overrepresented in one of the two templates, such as marine alveolates (MALV)-I and MALV-II that were much more abundant in DNA surveys. Overall, the groups displaying the highest relative contribution were Dinophyceae, Diatomea, Ciliophora and Acantharia. Also, well represented were Mamiellophyceae, Cryptomonadales, marine alveolates and marine stramenopiles in the picoplankton, and Monadofilosa and basal Fungi in sediments. Our extensive and systematic sequencing of geographically separated sites provides the most comprehensive molecular description of coastal marine protist diversity to date

    Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models.</p> <p>Results</p> <p>The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence.</p> <p>Conclusions</p> <p>Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements.</p

    Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources

    Get PDF
    An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org

    Varieties of living things: Life at the intersection of lineage and metabolism

    Get PDF
    publication-status: Publishedtypes: Articl

    Development of methods for the preparation of radiopure <sup>82</sup>Se sources for the SuperNEMO neutrinoless double-beta decay experiment

    Get PDF
    A radiochemical method for producing 82Se sources with an ultra-low level of contamination of natural radionuclides (40K, decay products of 232Th and 238U) has been developed based on cation-exchange chromatographic purification with reverse removal of impurities. It includes chromatographic separation (purification), reduction, conditioning (which includes decantation, centrifugation, washing, grinding, and drying), and 82Se foil production. The conditioning stage, during which highly dispersed elemental selenium is obtained by the reduction of purified selenious acid (H2SeO3) with sulfur dioxide (SO2) represents the crucial step in the preparation of radiopure 82Se samples. The natural selenium (600 g) was first produced in this procedure in order to refine the method. The technique developed was then used to produce 2.5 kg of radiopure enriched selenium (82Se). The produced 82Se samples were wrapped in polyethylene (12 μm thick) and radionuclides present in the sample were analyzed with the BiPo-3 detector. The radiopurity of the plastic materials (chromatographic column material and polypropylene chemical vessels), which were used at all stages, was determined by instrumental neutron activation analysis. The radiopurity of the 82Se foils was checked by measurements with the BiPo-3 spectrometer, which confirmed the high purity of the final product. The measured contamination level for 208Tl was 8-54 μBq/kg, and for 214Bi the detection limit of 600 μBq/kg has been reached.</p
    • …
    corecore