34 research outputs found
Dealing with the Data Deluge – New Strategies in Prokaryotic Genome Analysis
Recent technological innovations have ignited an explosion in microbial genome sequencing that has fundamentally changed our understanding of biology of microbes and profoundly impacted public health policy. This huge increase in DNA sequence data presents new challenges for the annotation, analysis, and visualization bioinformatics tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data. Genomes are organized in a hierarchical distance tree using single-copy ribosomal protein marker distances for distance calculation. Protein distance measures dissimilarity between markers of the same type and the subsequent genomic distance averages over the majority of marker-distances, ignoring the outliers. More than 30,000 genomes from public archives have been organized in a marker distance tree resulting in 6,438 species-level clades representing 7,597 taxonomic species. This computational infrastructure provides a foundation for prokaryotic gene and genome analysis, allowing easy access to pre-calculated genome groups at various distance levels. One of the most challenging problems in the current data deluge is the presentation of the relevant data at an appropriate resolution for each application, eliminating data redundancy but keeping biologically interesting variations
Phage annotation guide: Guidelines for assembly and high-quality annotation
All sequencing projects of bacteriophages (phages) should seek to report an accurate and comprehensive annotation of their genomes. This article defines 14 questions for those new to phage genomics that should be addressed before submitting a genome sequence to the International Nucleotide Sequence Database Collaboration or writing a publication
The ACS LCID Project: RR Lyrae stars as tracers of old population gradients in the isolated dwarf spheroidal galaxy Tucana
We present a study of the radial distribution of RR Lyrae variables, which
present a range of photometric and pulsational properties, in the dwarf
spheroidal galaxy Tucana. We find that the fainter RR Lyrae stars, having a
shorter period, are more centrally concentrated than the more luminous, longer
period RR Lyrae variables. Through comparison with the predictions of
theoretical models of stellar evolution and stellar pulsation, we interpret the
fainter RR Lyrae stars as a more metal-rich subsample. In addition, we show
that they must be older than about 10 Gyr. Therefore, the metallicity gradient
must have appeared very early on in the history of this galaxy.Comment: 5 pages, 5 figures in emulateapj style. Submitted to ApJ Letter
The ACS LCID Project. I. Short-Period Variables in the Isolated Dwarf Spheroidal Galaxies Cetus & Tucana
(abridged) We present the first study of the variable star populations in the
isolated dwarf spheroidal galaxies (dSph) Cetus and Tucana. Based on Hubble
Space Telescope images obtained with the Advanced Camera for Surveys in the
F475W and F814W bands, we identified 180 and 371 variables in Cetus and Tucana,
respectively. The vast majority are RR Lyrae stars. In Cetus we also found
three anomalous Cepheids, four candidate binaries and one candidate long-period
variable (LPV), while six anomalous Cepheids and seven LPV candidates were
found in Tucana. Of the RR Lyrae stars, 147 were identified as fundamental mode
(RRab) and only eight as first-overtone mode (RRc) in Cetus, with mean periods
of 0.614 and 0.363 day, respectively. In Tucana we found 216 RRab and 82 RRc
giving mean periods of 0.604 and 0.353 day. These values place both galaxies in
the so-called Oosterhoff Gap, as is generally the case for dSph. We calculated
the distance modulus to both galaxies using different approaches based on the
properties of RRab and RRc, namely the luminosity-metallicity and
period-luminosity-metallicity relations, and found values in excellent
agreement with previous estimates using independent methods:
(m-M)_{0,Cet}=24.46+-0.12 and (m-M)_{0,Tuc}=24.74+-0.12, corresponding to
780+-40 kpc and 890+-50 kpc. We also found numerous RR Lyrae variables
pulsating in both modes simultaneously (RRd): 17 in Cetus and 60 in Tucana.
Tucana is, after Fornax, the second dSph in which such a large fraction of RRd
(~17%) has been observed. We provide the photometry and pulsation parameters
for all the variables, and compare the latter with values from the literature
for well-studied dSph of the Local Group and Galactic globular clusters.Comment: 26 pages, 24 figures, in emulateapj format. To be published in ApJ.
Some figures heavily degraded; See
http://www.iac.es/project/LCID/?p=publications for a version with full
resolution figure
The National Center for Biotechnology Information's Protein Clusters Database
Rapid increases in DNA sequencing capabilities have led to a vast increase in the data generated from prokaryotic genomic studies, which has been a boon to scientists studying micro-organism evolution and to those who wish to understand the biological underpinnings of microbial systems. The NCBI Protein Clusters Database (ProtClustDB) has been created to efficiently maintain and keep the deluge of data up to date. ProtClustDB contains both curated and uncurated clusters of proteins grouped by sequence similarity. The May 2008 release contains a total of 285 386 clusters derived from over 1.7 million proteins encoded by 3806 nt sequences from the RefSeq collection of complete chromosomes and plasmids from four major groups: prokaryotes, bacteriophages and the mitochondrial and chloroplast organelles. There are 7180 clusters containing 376 513 proteins with curated gene and protein functional annotation. PubMed identifiers and external cross references are collected for all clusters and provide additional information resources. A suite of web tools is available to explore more detailed information, such as multiple alignments, phylogenetic trees and genomic neighborhoods. ProtClustDB provides an efficient method to aggregate gene and protein annotation for researchers and is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters
The ACS LCID Project:VIII. The short-period Cepheids of Leo A
We present the results of a new search for variable stars in the Local Group
dwarf galaxy Leo A, based on deep photometry from the Advanced Camera for
Surveys onboard the Hubble Space Telescope. We detected 166 bona fide variables
in our field, of which about 60 percent are new discoveries, and 33 candidate
variables. Of the confirmed variables, we found 156 Cepheids, but only 10 RR
Lyrae stars despite nearly 100 percent completeness at the magnitude of the
horizontal branch. The RR Lyrae stars include 7 fundamental and 3
first-overtone pulsators, with mean periods of 0.636 and 0.366 day,
respectively. From their position on the period-luminosity (PL) diagram and
light-curve morphology, we classify 91, 58, and 4 Cepheids as fundamental,
first-overtone, and second-overtone mode Classical Cepheids (CC), respectively,
and two as population II Cepheids. However, due to the low metallicity of Leo
A, about 90 percent of the detected Cepheids have periods shorter than 1.5
days. Comparison with theoretical models indicate that some of the fainter
stars classified as CC could be Anomalous Cepheids. We estimate the distance to
Leo A using the tip of the RGB (TRGB) and various methods based on the
photometric and pulsational properties of the Cepheids and RR Lyrae stars. The
distances obtained with the TRGB and RR Lyrae stars agree well with each other
while that from the Cepheid PL relations is somewhat larger, which may indicate
a mild metallicity effect on the luminosity of the short-period Cepheids. Due
to its very low metallicity, Leo A thus serves as a valuable calibrator of the
metallicity dependencies of the variable star luminosities.Comment: 16 pages, 13 figures. MNRAS, in pres
Analysis of spounaviruses as a case study for the overdue reclassification of tailed phages
Tailed bacteriophages are the most abundant and diverse viruses in the world, with genome sizes ranging from 10 kbp to over 500 kbp. Yet, due to historical reasons, all this diversity is confined to a single virus order-Caudovirales, composed of just four families: Myoviridae, Siphoviridae, Podoviridae, and the newly created Ackermannviridae family. In recent years, this morphology-based classification scheme has started to crumble under the constant flood of phage sequences, revealing that tailed phages are even more genetically diverse than once thought. This prompted us, the Bacterial and Archaeal Viruses Subcommittee of the International Committee on Taxonomy of Viruses (ICTV), to consider overall reorganization of phage taxonomy. In this study, we used a wide range of complementary methods-including comparative genomics, core genome analysis, and marker gene phylogenetics-to show that the group of Bacillus phage SPO1-related viruses previously classified into the Spounavirinae subfamily, is clearly distinct from other members of the family Myoviridae and its diversity deserves the rank of an autonomous family. Thus, we removed this group from the Myoviridae family and created the family Herelleviridae-a new taxon of the same rank. In the process of the taxon evaluation, we explored the feasibility of different demarcation criteria and critically evaluated the usefulness of our methods for phage classification. The convergence of results, drawing a consistent and comprehensive picture of a new family with associated subfamilies, regardless of method, demonstrates that the tools applied here are particularly useful in phage taxonomy. We are convinced that creation of this novel family is a crucial milestone toward much-needed reclassification in the Caudovirales order.Peer reviewe
HAYDN: High-precision AsteroseismologY of DeNse stellar fields
In the last decade, the Kepler and CoRoT space-photometry missions have demonstrated the potential of asteroseismology as a novel, versatile and powerful tool to perform exquisite tests of stellar physics, and to enable precise and accurate characterisations of stellar properties, with impact on both exoplanetary and Galactic astrophysics. Based on our improved understanding of the strengths and limitations of such a tool, we argue for a new small/medium space mission dedicated to gathering high-precision, high-cadence, long photometric series in dense stellar fields. Such a mission will lead to breakthroughs in stellar astrophysics, especially in the metal poor regime, will elucidate the evolution and formation of open and globular clusters, and aid our understanding of the assembly history and chemodynamics of the Milky Way’s bulge and a few nearby dwarf galaxies