242 research outputs found

    Centre selection for clinical trials and the generalisability of results: a mixed methods study.

    Get PDF
    BACKGROUND: The rationale for centre selection in randomised controlled trials (RCTs) is often unclear but may have important implications for the generalisability of trial results. The aims of this study were to evaluate the factors which currently influence centre selection in RCTs and consider how generalisability considerations inform current and optimal practice. METHODS AND FINDINGS: Mixed methods approach consisting of a systematic review and meta-summary of centre selection criteria reported in RCT protocols funded by the UK National Institute of Health Research (NIHR) initiated between January 2005-January 2012; and an online survey on the topic of current and optimal centre selection, distributed to professionals in the 48 UK Clinical Trials Units and 10 NIHR Research Design Services. The survey design was informed by the systematic review and by two focus groups conducted with trialists at the Birmingham Centre for Clinical Trials. 129 trial protocols were included in the systematic review, with a total target sample size in excess of 317,000 participants. The meta-summary identified 53 unique centre selection criteria. 78 protocols (60%) provided at least one criterion for centre selection, but only 31 (24%) protocols explicitly acknowledged generalisability. This is consistent with the survey findings (n = 70), where less than a third of participants reported generalisability as a key driver of centre selection in current practice. This contrasts with trialists' views on optimal practice, where generalisability in terms of clinical practice, population characteristics and economic results were prime considerations for 60% (n = 42), 57% (n = 40) and 46% (n = 32) of respondents, respectively. CONCLUSIONS: Centres are rarely enrolled in RCTs with an explicit view to external validity, although trialists acknowledge that incorporating generalisability in centre selection should ideally be more prominent. There is a need to operationalize 'generalisability' and incorporate it at the design stage of RCTs so that results are readily transferable to 'real world' practice

    Analysis of a viral metagenomic library from 200 m depth in Monterey Bay, California constructed by direct shotgun cloning

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Viruses have a profound influence on both the ecology and evolution of marine plankton, but the genetic diversity of viral assemblages, particularly those in deeper ocean waters, remains poorly described. Here we report on the construction and analysis of a viral metagenome prepared from below the euphotic zone in a temperate, eutrophic bay of coastal California.</p> <p>Methods</p> <p>We purified viruses from approximately one cubic meter of seawater collected from 200m depth in Monterey Bay, CA. DNA was extracted from the virus fraction, sheared, and cloned with no prior amplification into a plasmid vector and propagated in <it>E. coli </it>to produce the MBv200m library. Random clones were sequenced by the Sanger method. Sequences were assembled then compared to sequences in GenBank and to other viral metagenomic libraries using BLAST analyses.</p> <p>Results</p> <p>Only 26% of the 881 sequences remaining after assembly had significant (E ≤ 0.001) BLAST hits to sequences in the GenBank nr database, with most being matches to bacteria (15%) and viruses (8%). When BLAST analysis included environmental sequences, 74% of sequences in the MBv200m library had a significant match. Most of these hits (70%) were to microbial metagenome sequences and only 0.7% were to sequences from viral metagenomes. Of the 121 sequences with a significant hit to a known virus, 94% matched bacteriophages (Families <it>Podo</it>-, <it>Sipho</it>-, and <it>Myoviridae</it>) and 6% matched viruses of eukaryotes in the Family <it>Phycodnaviridae </it>(5 sequences) or the Mimivirus (2 sequences). The largest percentages of hits to viral genes of known function were to those involved in DNA modification (25%) or structural genes (17%). Based on reciprocal BLAST analyses, the MBv200m library appeared to be most similar to viral metagenomes from two other bays and least similar to a viral metagenome from the Arctic Ocean.</p> <p>Conclusions</p> <p>Direct cloning of DNA from diverse marine viruses was feasible and resulted in a distribution of virus types and functional genes at depth that differed in detail, but were broadly similar to those found in surface marine waters. Targeted viral analyses are useful for identifying those components of the greater marine metagenome that circulate in the subcellular size fraction.</p

    Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand.</p> <p>Results</p> <p>The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (<b>RAMMCAP</b>) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes".</p> <p>Conclusion</p> <p>RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from <url>http://tools.camera.calit2.net/camera/rammcap/</url>.</p

    Predicting Quantitative Genetic Interactions by Means of Sequential Matrix Approximation

    Get PDF
    Despite the emerging experimental techniques for perturbing multiple genes and measuring their quantitative phenotypic effects, genetic interactions have remained extremely difficult to predict on a large scale. Using a recent high-resolution screen of genetic interactions in yeast as a case study, we investigated whether the extraction of pertinent information encoded in the quantitative phenotypic measurements could be improved by computational means. By taking advantage of the observation that most gene pairs in the genetic interaction screens have no significant interactions with each other, we developed a sequential approximation procedure which ranks the mutation pairs in order of evidence for a genetic interaction. The sequential approximations can efficiently remove background variation in the double-mutation screens and give increasingly accurate estimates of the single-mutant fitness measurements. Interestingly, these estimates not only provide predictions for genetic interactions which are consistent with those obtained using the measured fitness, but they can even significantly improve the accuracy with which one can distinguish functionally-related gene pairs from the non-interacting pairs. The computational approach, in general, enables an efficient exploration and classification of genetic interactions in other studies and systems as well

    Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools.</p> <p>Results</p> <p>We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net).</p> <p>Conclusion</p> <p>The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.</p

    Probing Metagenomics by Rapid Cluster Analysis of Very Large Datasets

    Get PDF
    BACKGROUND: The scale and diversity of metagenomic sequencing projects challenge both our technical and conceptual approaches in gene and genome annotations. The recent Sorcerer II Global Ocean Sampling (GOS) expedition yielded millions of predicted protein sequences, which significantly altered the landscape of known protein space by more than doubling its size and adding thousands of new families (Yooseph et al., 2007 PLoS Biol 5, e16). Such datasets, not only by their sheer size, but also by many other features, defy conventional analysis and annotation methods. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we describe an approach for rapid analysis of the sequence diversity and the internal structure of such very large datasets by advanced clustering strategies using the newly modified CD-HIT algorithm. We performed a hierarchical clustering analysis on the 17.4 million Open Reading Frames (ORFs) identified from the GOS study and found over 33 thousand large predicted protein clusters comprising nearly 6 million sequences. Twenty percent of these clusters did not match known protein families by sequence similarity search and might represent novel protein families. Distributions of the large clusters were illustrated on organism composition, functional class, and sample locations. CONCLUSION/SIGNIFICANCE: Our clustering took about two orders of magnitude less computational effort than the similar protein family analysis of original GOS study. This approach will help to analyze other large metagenomic datasets in the future. A Web server with our clustering results and annotations of predicted protein clusters is available online at http://tools.camera.calit2.net/gos under the CAMERA project

    A Nitrile Hydratase in the Eukaryote Monosiga brevicollis

    Get PDF
    Bacterial nitrile hydratase (NHases) are important industrial catalysts and waste water remediation tools. In a global computational screening of conventional and metagenomic sequence data for NHases, we detected the two usually separated NHase subunits fused in one protein of the choanoflagellate Monosiga brevicollis, a recently sequenced unicellular model organism from the closest sister group of Metazoa. This is the first time that an NHase is found in eukaryotes and the first time it is observed as a fusion protein. The presence of an intron, subunit fusion and expressed sequence tags covering parts of the gene exclude contamination and suggest a functional gene. Phylogenetic analyses and genomic context imply a probable ancient horizontal gene transfer (HGT) from proteobacteria. The newly discovered NHase might open biotechnological routes due to its unconventional structure, its new type of host and its apparent integration into eukaryotic protein networks

    A statistical toolbox for metagenomics: assessing functional diversity in microbial communities

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data.</p> <p>Results</p> <p>Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments.</p> <p>Conclusion</p> <p>The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.</p
    corecore