397 research outputs found

    Rapid evolution of virulence and drug resistance in the emerging zoonotic pathogen Streptococcus suis

    Get PDF
    Background: Streptococcus suis is a zoonotic pathogen that infects pigs and can occasionally cause serious infections in humans. S. suis infections occur sporadically in human Europe and North America, but a recent major outbreak has been described in China with high levels of mortality. The mechanisms of S. suis pathogenesis in humans and pigs are poorly understood. Methodology/Principal Findings: The sequencing of whole genomes of S. suis isolates provides opportunities to investigate the genetic basis of infection. Here we describe whole genome sequences of three S. suis strains from the same lineage: one from European pigs, and two from human cases from China and Vietnam. Comparative genomic analysis was used to investigate the variability of these strains. S. suis is phylogenetically distinct from other Streptococcus species for which genome sequences are currently available. Accordingly, ,40% of the ,2 Mb genome is unique in comparison to other Streptococcus species. Finer genomic comparisons within the species showed a high level of sequence conservation; virtually all of the genome is common to the S. suis strains. The only exceptions are three ,90 kb regions, present in the two isolates from humans, composed of integrative conjugative elements and transposons. Carried in these regions are coding sequences associated with drug resistance. In addition, small-scale sequence variation has generated pseudogenes in putative virulence and colonization factors. Conclusions/Significance: The genomic inventories of genetically related S. suis strains, isolated from distinct hosts and diseases, exhibit high levels of conservation. However, the genomes provide evidence that horizontal gene transfer has contributed to the evolution of drug resistance

    Meeting the Cool Neighbors VII: Spectroscopy of faint, red NLTT dwarfs

    Full text link
    We present low-resolution optical spectroscopy and BVRI photometry of 453 candidate nearby stars drawn from the NLTT proper motion catalogue. The stars were selected based on optical/near-infrared colours, derived by combining the NLTT photographic data with photometry from the 2MASS Second Incremental Data Release. Based on the derived photometric and spectroscopic parallaxes, we identify 111 stars as lying within 20 parsecs of the Sun, including 9 stars with formal distance estimates of less than 10 parsecs. A further 53 stars have distance estimates within 1-sigma of our 20-parsec limit. Almost all of those stars are additions to the nearby star census. In total, our NLTT-based survey has so far identified 496 stars likely to be within 20 parsecs, of which 195 are additions to nearby-star catalogues. Most of the newly-identified nearby stars have spectral types between M4 and M8.Comment: 41 pages, 7 figure

    The Bioperl toolkit: Perl modules for the life sciences

    Get PDF
    The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort

    Defining functional distances over Gene Ontology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A fundamental problem when trying to define the functional relationships between proteins is the difficulty in quantifying functional similarities, even when well-structured ontologies exist regarding the activity of proteins (i.e. 'gene ontology' -GO-). However, functional metrics can overcome the problems in the comparing and evaluating functional assignments and predictions. As a reference of proximity, previous approaches to compare GO terms considered linkage in terms of ontology weighted by a probability distribution that balances the non-uniform 'richness' of different parts of the Direct Acyclic Graph. Here, we have followed a different approach to quantify functional similarities between GO terms.</p> <p>Results</p> <p>We propose a new method to derive 'functional distances' between GO terms that is based on the simultaneous occurrence of terms in the same set of Interpro entries, instead of relying on the structure of the GO. The coincidence of GO terms reveals natural biological links between the GO functions and defines a distance model <it>D</it><sub><it>f </it></sub>which fulfils the properties of a Metric Space. The distances obtained in this way can be represented as a hierarchical 'Functional Tree'.</p> <p>Conclusion</p> <p>The method proposed provides a new definition of distance that enables the similarity between GO terms to be quantified. Additionally, the 'Functional Tree' defines groups with biological meaning enhancing its utility for protein function comparison and prediction. Finally, this approach could be for function-based protein searches in databases, and for analysing the gene clusters produced by DNA array experiments.</p

    Meeting report: a workshop on Best Practices in Genome Annotation

    Get PDF
    Efforts to annotate the genomes of a wide variety of model organisms are currently carried out by sequencing centers, model organism databases and academic/institutional laboratories around the world. Different annotation methods and tools have been developed over time to meet the needs of biologists faced with the task of annotating biological data. While standardized methods are essential for consistent curation within each annotation group, methods and tools can differ between groups, especially when the groups are curating different organisms. Biocurators from several institutes met at the Third International Biocuration Conference in Berlin, Germany, April 2009 and hosted the ‘Best Practices in Genome Annotation: Inference from Evidence’ workshop to share their strategies, pipelines, standards and tools. This article documents the material presented in the workshop

    Telomeric expression sites are highly conserved in trypanosoma brucei

    Get PDF
    Subtelomeric regions are often under-represented in genome sequences of eukaryotes. One of the best known examples of the use of telomere proximity for adaptive purposes are the bloodstream expression sites (BESs) of the African trypanosome Trypanosoma brucei. To enhance our understanding of BES structure and function in host adaptation and immune evasion, the BES repertoire from the Lister 427 strain of T. brucei were independently tagged and sequenced. BESs are polymorphic in size and structure but reveal a surprisingly conserved architecture in the context of extensive recombination. Very small BESs do exist and many functioning BESs do not contain the full complement of expression site associated genes (ESAGs). The consequences of duplicated or missing ESAGs, including ESAG9, a newly named ESAG12, and additional variant surface glycoprotein genes (VSGs) were evaluated by functional assays after BESs were tagged with a drug-resistance gene. Phylogenetic analysis of constituent ESAG families suggests that BESs are sequence mosaics and that extensive recombination has shaped the evolution of the BES repertoire. This work opens important perspectives in understanding the molecular mechanisms of antigenic variation, a widely used strategy for immune evasion in pathogens, and telomere biology

    Structuring and extracting knowledge for the support of hypothesis generation in molecular biology

    Get PDF
    Background: Hypothesis generation in molecular and cellular biology is an empirical process in which knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from PubMed. Semantic Web provides tools for sharing prior knowledge, while information retrieval and information extraction techniques enable its extraction from literature. Their combination makes prior knowledge available for computational analysis and inference. While some tools provide complete solutions that limit the control over the modeling and extraction processes, we seek a methodology that supports control by the experimenter over these critical processes. Results: We describe progress towards automated support for the generation of biomolecular hypotheses. Semantic Web technologies are used to structure and store knowledge, while a workflow extracts knowledge from text. We designed minimal proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance. The models fit a methodology that allows focus on the requirements of a single experiment while supporting reuse and posterior analysis of extracted knowledge from multiple experiments. Our workflow is composed of services from the 'Adaptive Information Disclosure Application' (AIDA) toolkit as well as a few others. The output is a semantic model with putative biological relations, with each relation linked to the corresponding evidence. Conclusion: We demonstrated a 'do-it-yourself' approach for structuring and extracting knowledge in the context of experimental research on biomolecular mechanisms. The methodology can be used to bootstrap the construction of semantically rich biological models using the results of knowledge extraction processes. Models specific to particular experiments can be constructed that, in turn, link with other semantic models, creating a web of knowledge that spans experiments. Mapping mechanisms can link to other knowledge resources such as OBO ontologies or SKOS vocabularies. AIDA Web Services can be used to design personalized knowledge extraction procedures. In our example experiment, we found three proteins (NF-Kappa B, p21, and Bax) potentially playing a role in the interplay between nutrients and epigenetic gene regulation

    An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics

    Get PDF
    For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types

    Genomic analyses identify recurrent MEF2D fusions in acute lymphoblastic leukemia

    Get PDF
    Chromosomal rearrangements are initiating events in acute lymphoblastic leukaemia (ALL). Here using RNA sequencing of 560 ALL cases, we identify rearrangements between MEF2D (myocyte enhancer factor 2D) and five genes (BCL9, CSF1R, DAZAP1, HNRNPUL1 and SS18) in 22 B progenitor ALL (B-ALL) cases with a distinct gene expression profile, the most common of which is MEF2DBCL9. Examination of an extended cohort of 1,164 B-ALL cases identified 30 cases with MEF2D rearrangements, which include an additional fusion partner, FOXJ2; thus, MEF2D-rearranged cases comprise 5.3% of cases lacking recurring alterations. MEF2D-rearranged ALL is characterized by a distinct immunophenotype, DNA copy number alterations at the rearrangement sites, older diagnosis age and poor outcome. The rearrangements result in enhanced MEF2D transcriptional activity, lymphoid transformation, activation of HDAC9 expression and sensitive to histone deacetylase inhibitor treatment. Thus, MEF2D-rearranged ALL represents a distinct form of high-risk leukaemia, for which new therapeutic approaches should be considered.This work was supported in part by the American Lebanese Syrian Associated Charities of St. Jude Children’s Research Hospital; by a Stand Up to Cancer Innovative Research Grant and St. Baldrick’s Foundation Scholar Award (to C.G.M.); by a St. Baldrick’s Consortium Award (S.P.H.), by a Leukemia and Lymphoma Society Specialized Center of Research grant (S.P.H. and C.G.M.), by a Lady Tata Memorial Trust Award (I.I.), by a Leukemia and Lymphoma Society Special Fellow Award and Alex’s Lemonade Stand Foundation Young Investigator Awards (K.R.), by an Alex’s Lemonade Stand Foundation Award (M.L.) and by National Cancer Institute Grants CA21765 (St Jude Cancer Center Support Grant), U01 CA157937 (C.L.W. and S.P.H.), U24 CA114737 (to Dr Gastier-Foster), NCI Contract HHSN261200800001E (to Dr Gastier-Foster), U10 CA180820 (ECOG-ACRIN Operations) and CA180827 (E.P.); U10 CA180861 (C.D.B. and G.M.); U24 CA196171 (The Alliance NCTN Biorepository and Biospecimen Resource); CA145707 (C.L.W. and C.G.M.); and grants to the COG: U10 CA98543 (Chair’s grant and supplement to support the COG ALL TARGET project), U10 CA98413 (Statistical Center) and U24 CA114766 (Specimen Banking). This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Contract Number HHSN261200800001E
    corecore