Search CORE

414 research outputs found

Recommended from our members

Clades of huge phages from across Earth's ecosystems.

Bacteriophages typically have small genomes1 and depend on their bacterial hosts for replication2. Here we sequenced DNA from diverse ecosystems and found hundreds of phage genomes with lengths of more than 200 kilobases (kb), including a genome of 735 kb, which is-to our knowledge-the largest phage genome to be described to date. Thirty-five genomes were manually curated to completion (circular and no gaps). Expanded genetic repertoires include diverse and previously undescribed CRISPR-Cas systems, transfer RNAs (tRNAs), tRNA synthetases, tRNA-modification enzymes, translation-initiation and elongation factors, and ribosomal proteins. The CRISPR-Cas systems of phages have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phages may repurpose bacterial CRISPR-Cas systems to eliminate competing phages. We phylogenetically define the major clades of huge phages from human and other animal microbiomes, as well as from oceans, lakes, sediments, soils and the built environment. We conclude that the large gene inventories of huge phages reflect a conserved biological strategy, and that the phages are distributed across a broad bacterial host range and across Earth's ecosystems

eScholarship - University of California

Online Research Database In Technology

Provenance, propagation and quality of biological annotation

Author: Bell Michael James
Publication venue: Newcastle University
Publication date: 01/01/2014
Field of study

PhD ThesisBiological databases have become an integral part of the life sciences, being used to store, organise and share ever-increasing quantities and types of data. Biological databases are typically centred around raw data, with individual entries being assigned to a single piece of biological data, such as a DNA sequence. Although essential, a reader can obtain little information from the raw data alone. Therefore, many databases aim to supplement their entries with annotation, allowing the current knowledge about the underlying data to be conveyed to a reader. Although annotations come in many di erent forms, most databases provide some form of free text annotation. Given that annotations can form the foundations of future work, it is important that a user is able to evaluate the quality and correctness of an annotation. However, this is rarely straightforward. The amount of annotation, and the way in which it is curated, varies between databases. For example, the production of an annotation in some databases is entirely automated, without any manual intervention. Further, sections of annotations may be reused, being propagated between entries and, potentially, external databases. This provenance and curation information is not always apparent to a user. The work described within this thesis explores issues relating to biological annotation quality. While the most valuable annotation is often contained within free text, its lack of structure makes it hard to assess. Initially, this work describes a generic approach that allows textual annotations to be quantitatively measured. This approach is based upon the application of Zipf's Law to words within textual annotation, resulting in a single value, . The relationship between the value and Zipf's principle of least e ort provides an indication as to the annotations quality, whilst also allowing annotations to be quantitatively compared. Secondly, the thesis focuses on determining annotation provenance and tracking any subsequent propagation. This is achieved through the development of a visualisation - i - framework, which exploits the reuse of sentences within annotations. Utilising this framework a number of propagation patterns were identi ed, which on analysis appear to indicate low quality and erroneous annotation. Together, these approaches increase our understanding in the textual characteristics of biological annotation, and suggests that this understanding can be used to increase the overall quality of these resources

Newcastle University eTheses

Community structure and function of high-temperature chlorophototrophic microbial mats inhabiting diverse geothermal environments

Author: Boomer Sarah M.
Bryant Donald A.
Herrgard Markus
Inskeep William P.
Jay Zackary J.
Klatt Christian G.
Miller Scott R.
Parenteau M. Niki
Rusch Douglas B.
Tringe Susannah G.
Ward DavidM.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2013
Field of study

Six phototrophic microbial mat communities from different geothermal springs (YNP) were studied using metagenome sequencing and geochemical analyses. The primary goals of this work were to determine differences in community composition of high-temperature phototrophic mats distributed across the Yellowstone geothermal ecosystem, and to identify metabolic attributes of predominant organisms present in these communities that may correlate with environmental attributes important in niche differentiation. Random shotgun metagenome sequences from six phototrophic communities (average~ 53 Mbp/site) were subjected to multiple taxonomic, phylogenetic and functional analyses. All methods, including G+C content distribution, MEGAN analyses and oligonucleotide frequency-based clustering, provided strong support for the dominant community members present in each site. Cyanobacteria were only observed in non-sulfidic sites; de novo assemblies were obtained for Synechococcus-like populations at Chocolate Pots (CP_7) and Fischerella-like populations at White Creek (WC_6). Chloroflexi-like sequences (esp. Roseiflexus and/or Chloroflexus spp.) were observed in all six samples and contained genes involved in bacteriochlorophyll biosynthesis and the 3-hydroxypropionate carbon fixation pathway. Other major sequence assemblies were obtained for a Chlorobiales population from CP_7 (proposed family Thermochlorobacteriaceae), and an anoxygenic, sulfur-oxidizing Thermochromatium-like (Gamma-proteobacteria) population from Bath Lake Vista Annex (BLVA_20). Additional sequence coverage is necessary to establish more complete assemblies of other novel bacteria in these sites (e.g., Bacteroidetes and Firmicutes); however, current assemblies suggested that several of these organisms play important roles in heterotrophic and fermentative metabolisms. Definitive linkages were established between several of the dominant phylotypes present in these habitats and important functional processes such a

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Online Research Database In Technology

Clades of huge phages from across Earth's ecosystems

Author: Al-Shayeb B
Amano Y
Amundson R
Anantharaman K
Banfield JF
Borton MA
Bouma-Gregson K
Brooks B
Castelle CJ
Cate JHD
Chen L-X
Devoto A
Doudna JA
Farag IF
Finstad K
Goltsman DSA
Harrison S
He C
Jaffe AL
Kantor R
Keren R
Lane KR
Lavy A
Lehours A-C
Lei S
Li W-J
Matheus-Carnevali P
Morowitz M
Munk P
Méheust R
Nelson TC
Olm MR
Power ME
Probst AJ
Relman DA
Sachdeva R
Santini JM
Sharrar A
Sun C
Thomas A
Tringe SG
Ward F
Warren L
Wrighton K
Zhou J
Publication venue
Publication date: 20/02/2020
Field of study

Bacteriophages typically have small genomes and depend on their bacterial hosts for replication. Here we sequenced DNA from diverse ecosystems and found hundreds of phage genomes with lengths of more than 200 kilobases (kb), including a genome of 735 kb, which is-to our knowledge-the largest phage genome to be described to date. Thirty-five genomes were manually curated to completion (circular and no gaps). Expanded genetic repertoires include diverse and previously undescribed CRISPR-Cas systems, transfer RNAs (tRNAs), tRNA synthetases, tRNA-modification enzymes, translation-initiation and elongation factors, and ribosomal proteins. The CRISPR-Cas systems of phages have the capacity to silence host transcription factors and translational genes, potentially as part of a larger interaction network that intercepts translation to redirect biosynthesis to phage-encoded functions. In addition, some phages may repurpose bacterial CRISPR-Cas systems to eliminate competing phages. We phylogenetically define the major clades of huge phages from human and other animal microbiomes, as well as from oceans, lakes, sediments, soils and the built environment. We conclude that the large gene inventories of huge phages reflect a conserved biological strategy, and that the phages are distributed across a broad bacterial host range and across Earth's ecosystems

UCL Discovery

InterPro in 2017-beyond protein family and domain annotations

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences

PubMed Central

eScholarship - University of California

The University of Manchester - Institutional Repository

Explore Bristol Research

Archivio istituzionale della ricerca - Università di Padova

InterPro in 2017-beyond protein family and domain annotations

Author: Attwood TK
Babbitt PC
Bateman A
Bork P
Bridge AJ
Chang HY
Dosztányi Z
El-Gebali S
Finn RD
Fraser M
Gough J
Haft D
Holliday GL
Huang H
Huang X
Letunic I
Lopez R
Lu S
Marchler-Bauer A
Mi H
Mistry J
Mitchell AL
Natale DA
Necci M
Nuka G
Orengo CA
Park Y
Pesseat S
Piovesan D
Potter SC
Rawlings ND
Redaschi N
Richardson L
Rivoire C
Sangrador-Vegas A
Sigrist C
Sillitoe I
Smithers B
Squizzato S
Sutton G
Thanki N
Thomas PD
Tosatto SC
Wu CH
Xenarios I
Yeh LS
Young SY
Publication venue
Publication date: 29/11/2016
Field of study

UCL Discovery

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

Author
Publication venue: 'PeerJ'
Publication date
Field of study

Crossref

HAMAP in 2013, new developments in the protein family classification and annotation system

Author: Auchincloss Andrea H.
Baratin Delphine
Bougueleret Lydie
Bridge Alan
Coudert Elisabeth
Cuche Béatrice A.
de Castro Edouard
Keller Guillaume
Pedruzzi Ivo
Poux Sylvain
Redaschi Nicole
Rivoire Catherine
Xenarios Ioannis
Publication venue
Publication date: 02/08/2017
Field of study

HAMAP (High-quality Automated and Manual Annotation of Proteins—available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profile

RERO DOC Digital Library