109 research outputs found
The What, Why and How of openness in science
We give a short introduction to open science followed by an overview of Creative Commons and open source licenses
A computational screen for type I polyketide synthases in metagenomics shotgun data
BACKGROUND: Polyketides are a diverse group of biotechnologically important secondary metabolites that are produced by multi domain enzymes called polyketide synthases (PKS). METHODOLOGY/PRINCIPAL FINDINGS: We have estimated frequencies of type I PKS (PKS I) – a PKS subgroup – in natural environments by using Hidden-Markov-Models of eight domains to screen predicted proteins from six metagenomic shotgun data sets. As the complex PKS I have similarities to other multi-domain enzymes (like those for the fatty acid biosynthesis) we increased the reliability and resolution of the dataset by maximum-likelihood trees. The combined information of these trees was then used to discriminate true PKS I domains from evolutionary related but functionally different ones. We were able to identify numerous novel PKS I proteins, the highest density of which was found in Minnesota farm soil with 136 proteins out of 183,536 predicted genes. We also applied the protocol to UniRef database to improve the annotation of proteins with so far unknown function and identified some new instances of horizontal gene transfer. CONCLUSIONS/SIGNIFICANCE: The screening approach proved powerful in identifying PKS I sequences in large sequence data sets and is applicable to many other protein families
Grad-seq guides the discovery of ProQ as a major small RNA-binding protein
The functional annotation of transcriptomes and identification of noncoding RNA (ncRNA) classes has been greatly facilitated by the advent of next-generation RNA sequencing which, by reading the nucleotide order of transcripts, theoretically allows the rapid profiling of all transcripts in a cell. However, primary sequence per se is a poor predictor of function, as ncRNAs dramatically vary in length and structure and often lack identifiable motifs. Therefore, to visualize an informative RNA landscape of organisms with potentially new RNA biology that are emerging from microbiome and environmental studies requires the use of more functionally relevant criteria. One such criterion is the association of RNAs with functionally important cognate RNA-binding proteins. Here we analyze the full ensemble of cellular RNAs using gradient profiling by sequencing (Grad-seq) in the bacterial pathogen Salmonella enterica, partitioning its coding and noncoding transcripts based on their network of RNA–protein interactions. In addition to capturing established RNA classes based on their biochemical profiles, the Grad-seq approach enabled the discovery of an overlooked large collective of structured small RNAs that form stable complexes with the conserved protein ProQ. We show that ProQ is an abundant RNA-binding protein with a wide range of ligands and a global influence on Salmonella gene expression. Given its generic ability to chart a functional RNA landscape irrespective of transcript length and sequence diversity, Grad-seq promises to define functional RNA classes and major RNA-binding proteins in both model species and genetically intractable organisms
A Mission to Explore the Pioneer Anomaly
The Pioneer 10 and 11 spacecraft yielded the most precise navigation in deep
space to date. These spacecraft had exceptional acceleration sensitivity.
However, analysis of their radio-metric tracking data has consistently
indicated that at heliocentric distances of astronomical units,
the orbit determinations indicated the presence of a small, anomalous, Doppler
frequency drift. The drift is a blue-shift, uniformly changing with a rate of
Hz/s, which can be interpreted as a
constant sunward acceleration of each particular spacecraft of . This signal has become known as the Pioneer
anomaly. The inability to explain the anomalous behavior of the Pioneers with
conventional physics has contributed to growing discussion about its origin.
There is now an increasing number of proposals that attempt to explain the
anomaly outside conventional physics. This progress emphasizes the need for a
new experiment to explore the detected signal. Furthermore, the recent
extensive efforts led to the conclusion that only a dedicated experiment could
ultimately determine the nature of the found signal. We discuss the Pioneer
anomaly and present the next steps towards an understanding of its origin. We
specifically focus on the development of a mission to explore the Pioneer
Anomaly in a dedicated experiment conducted in deep space.Comment: 8 pages, 9 figures; invited talk given at the 2005 ESLAB Symposium
"Trends in Space Science and Cosmic Vision 2020", 19-21 April 2005, ESTEC,
Noordwijk, The Netherland
Fundamental Physics with the Laser Astrometric Test Of Relativity
The Laser Astrometric Test Of Relativity (LATOR) is a joint European-U.S.
Michelson-Morley-type experiment designed to test the pure tensor metric nature
of gravitation - a fundamental postulate of Einstein's theory of general
relativity. By using a combination of independent time-series of highly
accurate gravitational deflection of light in the immediate proximity to the
Sun, along with measurements of the Shapiro time delay on interplanetary scales
(to a precision respectively better than 0.1 picoradians and 1 cm), LATOR will
significantly improve our knowledge of relativistic gravity. The primary
mission objective is to i) measure the key post-Newtonian Eddington parameter
\gamma with accuracy of a part in 10^9. (1-\gamma) is a direct measure for
presence of a new interaction in gravitational theory, and, in its search,
LATOR goes a factor 30,000 beyond the present best result, Cassini's 2003 test.
The mission will also provide: ii) first measurement of gravity's non-linear
effects on light to ~0.01% accuracy; including both the Eddington \beta
parameter and also the spatial metric's 2nd order potential contribution (never
measured before); iii) direct measurement of the solar quadrupole moment J2
(currently unavailable) to accuracy of a part in 200 of its expected size; iv)
direct measurement of the "frame-dragging" effect on light by the Sun's
gravitomagnetic field, to 1% accuracy. LATOR's primary measurement pushes to
unprecedented accuracy the search for cosmologically relevant scalar-tensor
theories of gravity by looking for a remnant scalar field in today's solar
system. We discuss the mission design of this proposed experiment.Comment: 8 pages, 9 figures; invited talk given at the 2005 ESLAB Symposium
"Trends in Space Science and Cosmic Vision 2020," 19-21 April 2005, ESTEC,
Noodrwijk, The Netherland
Phylogenetic Analysis of the Teneurins: Conserved Features and Premetazoan Ancestry
Teneurins are type II transmembrane proteins expressed during pattern formation and neurogenesis with an intracellular domain that can be transported to the nucleus and an extracellular domain that can be shed into the extracellular milieu. In Drosophila melanogaster, Caenorhabditis elegans, and mouse the knockdown or knockout of teneurin expression can lead to abnormal patterning, defasciculation, and abnormal pathfinding of neurites, and the disruption of basement membranes. Here, we have identified and analyzed teneurins from a broad range of metazoan genomes for nuclear localization sequences, protein interaction domains, and furin cleavage sites and have cloned and sequenced the intracellular domains of human and avian teneurins to analyze alternative splicing. The basic organization of teneurins is highly conserved in Bilateria: all teneurins have epidermal growth factor (EGF) repeats, a cysteine-rich domain, and a large region identical in organization to the carboxy-half of prokaryotic YD-repeat proteins. Teneurins were not found in the genomes of sponges, cnidarians, or placozoa, but the choanoflagellate Monosiga brevicollis has a gene encoding a predicted teneurin with a transmembrane domain, EGF repeats, a cysteine-rich domain, and a region homologous to YD-repeat proteins. Further examination revealed that most of the extracellular domain of the M. brevicollis teneurin is encoded on a single huge 6,829-bp exon and that the cysteine-rich domain is similar to sequences found in an enzyme expressed by the diatom Phaeodactylum tricornutum. This leads us to suggest that teneurins are complex hybrid fusion proteins that evolved in a choanoflagellate via horizontal gene transfer from both a prokaryotic gene and a diatom or algal gene, perhaps to improve the capacity of the choanoflagellate to bind to its prokaryotic prey. As choanoflagellates are considered to be the closest living relatives of animals, the expression of a primitive teneurin by an ancestral choanoflagellate may have facilitated the evolution of multicellularity and complex histogenesis in metazoa
Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes.
INTRODUCTION: Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. RESULTS: We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. CONCLUSION: Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study
Distributional theory for the DIA method
The DIA method for the detection, identification and adaptation of model misspecifications combines estimation with testing. The aim of the present contribution is to introduce a unifying framework for the rigorous capture of this combination. By using a canonical model formulation and a partitioning of misclosure space, we show that the whole estimation–testing scheme can be captured in one single DIA estimator. We study the characteristics of this estimator and discuss some of its distributional properties. With the distribution of the DIA estimator provided, one can then study all the characteristics of the combined estimation and testing scheme, as well as analyse how they propagate into final outcomes. Examples are given, as well as a discussion on how the distributional properties compare with their usage in practice
The PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences
Metagenome sequencing is becoming common and there is an increasing need for easily accessible tools for data analysis. An essential step is the taxonomic classification of sequence fragments. We describe a web server for the taxonomic assignment of metagenome sequences with PhyloPythiaS. PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clades. Taxonomic assignments with the web server can be made with a generic model, or with sample-specific models that users can specify and create. Several interactive visualization modes and multiple download formats allow quick and convenient analysis and downstream processing of taxonomic assignments. Here, we demonstrate usage of our web server by taxonomic assignment of metagenome samples from an acidophilic biofilm community of an acid mine and of a microbial community from cow rumen
TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW. TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009;10(1):56.Background:
Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning.
Results
Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained.
Conclusion:
An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date.
Background
- …