Search CORE

200 research outputs found

Recommended from our members

Programmed DNA destruction by miniature CRISPR-Cas14 enzymes.

Author: Banfield JF
Burstein D
Chen JS
Cofsky JC
Doudna JA
Harrington LB
Kyrpides NC
Ma E
Paez-Espino D
Witte IP
Publication venue: eScholarship, University of California
Publication date: 01/11/2018
Field of study

CRISPR-Cas systems provide microbes with adaptive immunity to infectious nucleic acids and are widely employed as genome editing tools. These tools use RNA-guided Cas proteins whose large size (950 to 1400 amino acids) has been considered essential to their specific DNA- or RNA-targeting activities. Here we present a set of CRISPR-Cas systems from uncultivated archaea that contain Cas14, a family of exceptionally compact RNA-guided nucleases (400 to 700 amino acids). Despite their small size, Cas14 proteins are capable of targeted single-stranded DNA (ssDNA) cleavage without restrictive sequence requirements. Moreover, target recognition by Cas14 triggers nonspecific cutting of ssDNA molecules, an activity that enables high-fidelity single-nucleotide polymorphism genotyping (Cas14-DETECTR). Metagenomic data show that multiple CRISPR-Cas14 systems evolved independently and suggest a potential evolutionary origin of single-effector CRISPR-based adaptive immunity

eScholarship - University of California

GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes

Author: A Nagy
AL Delcher
Amrita Pati
Athanasios Lykidis
DA Benson
Galina Ovchinnikova
GX Yu
HQ Zhu
J Besemer
KL Smollett
M Tech
Natalia Mikhailova
Natalia N Ivanova
NC Kyrpides
NE Castellana
Nikos C Kyrpides
RK Aziz
S Bocs
Sean D Hooper
VM Markowitz
Y Ishino
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2010
Field of study

We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies

Crossref

UNT Digital Library

Abundant Human DNA Contamination Identified in Non-Primate Genome Databases

Author: AM Waterhouse
GE Liu
GN Rutty
H Malmstrom
HN Poinar
J Jurka
MA Larkin
Mark S. Longo
Michael J. O'Neill
Najib El-Sayed
NC Kyrpides
PL Deininger
Rachel J. O'Neill
RM Durbin
SF Altschul
TJ Katz
WJ Kent
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

During routine screens of the NCBI databases using human repetitive elements we discovered an unlikely level of nucleotide identity across a broad range of phyla. To ascertain whether databases containing DNA sequences, genome assemblies and trace archive reads were contaminated with human sequences, we performed an in depth search for sequences of human origin in non-human species. Using a primate specific SINE, AluY, we screened 2,749 non-primate public databases from NCBI, Ensembl, JGI, and UCSC and have found 492 to be contaminated with human sequence. These represent species ranging from bacteria (B. cereus) to plants (Z. mays) to fish (D. rerio) with examples found from most phyla. The identification of such extensive contamination of human sequence across databases and sequence types warrants caution among the sequencing community in future sequencing efforts, such as human re-sequencing. We discuss issues this may raise as well as present data that gives insight as to how this may be occurring

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Systematic search for putative new domain families in Mycoplasma gallisepticum genome

Author: A Lupas
A Marchler-Bauer
AG Murzin
B Rost
Bernard Offmann
CA Orengo
CC Reddy
CC Reddy
Chilamakuri CS Reddy
CS Reddy
EL Sonnhammer
GE Tusnady
J Park
JD Thompson
K Tamura
L Papazisi
LJ McGuffin
N Saitou
NC Kyrpides
R Sowdhamini
R Sowdhamini
S Dietmann
Sane Sudha Rani
SF Altschul
SR Eddy
T Nakatsu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Protein domains are the fundamental units of protein structure, function and evolution. The delineation of different domains in proteins is important for classification, understanding of structure, function and evolution. The delineation of protein domains within a polypeptide chain, namely at the genome scale, can be achieved in several ways but may remain problematic in many instances. Difficulties in identifying the domain content of a given sequence arise when the query sequence has no homologues with experimentally determined structure and searching against sequence domain databases also results in insignificant matches. Identification of domains under low sequence identity conditions and lack of structural homologues acquire a crucial importance especially at the genomic scale. Findings We have developed a new method for the identification of domains in unassigned regions through indirect connections and scaled up its application to the analysis of 434 unassigned regions in 726 protein sequences of <it>Mycoplasma gallisepticum </it>genome. We could establish 71 new domain relationships and probable 63 putative new domain families through intermediate sequences in the unassigned regions, which importantly represent an overall 10% increase in PfamA domain annotation over the direct assignment in this genome. Conclusions The systematic analysis of the unassigned regions in the <it>Mycoplasma gallisepticum </it>genome has provided some insight into the possible new domain relationships and putative new domain families. Further investigation of these predicted new domains may prove beneficial in improving the existing domain prediction algorithms.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Hal-Diderot

A synthesis of bacterial and archaeal phenotypic trait data.

Author: Brbic M
Corkrey R
Danko D
Edwards K
Engqvist MKM
Fierer N
Geoghegan JL
Gillings M
Kyrpides NC
Litchman E
Madin JS
Mason CE
Moore L
Nielsen DA
Nielsen SL
Paulsen IT
Price ND
Reddy TBK
Richards MA
Rocha EPC
Schmidt TM
Shaaban H
Shukla M
Supek F
Tetu SG
Vieira-Silva S
Wattam AR
Westfall DA
Westoby M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/10/2020
Field of study

A synthesis of phenotypic and quantitative genomic traits is provided for bacteria and archaea, in the form of a scripted, reproducible workflow that standardizes and merges 26 sources. The resulting unified dataset covers 14 phenotypic traits, 5 quantitative genomic traits, and 4 environmental characteristics for approximately 170,000 strain-level and 15,000 species-aggregated records. It spans all habitats including soils, marine and fresh waters and sediments, host-associated and thermal. Trait data can find use in clarifying major dimensions of ecological strategy variation across species. They can also be used in conjunction with species and abundance sampling to characterize trait mixtures in communities and responses of traits along environmental gradients

OPUS - University of Technology Sydney

A New Role for Translation Initiation Factor 2 in Maintaining Genome Integrity

Escherichia coli translation initiation factor 2 (IF2) performs the unexpected function of promoting transition from recombination to replication during bacteriophage Mu transposition in vitro, leading to initiation by replication restart proteins. This function has suggested a role of IF2 in engaging cellular restart mechanisms and regulating the maintenance of genome integrity. To examine the potential effect of IF2 on restart mechanisms, we characterized its influence on cellular recovery following DNA damage by methyl methanesulfonate (MMS) and UV damage. Mutations that prevent expression of full-length IF2-1 or truncated IF2-2 and IF2-3 isoforms affected cellular growth or recovery following DNA damage differently, influencing different restart mechanisms. A deletion mutant (del1) expressing only IF2-2/3 was severely sensitive to growth in the presence of DNA-damaging agent MMS. Proficient as wild type in repairing DNA lesions and promoting replication restart upon removal of MMS, this mutant was nevertheless unable to sustain cell growth in the presence of MMS; however, growth in MMS could be partly restored by disruption of sulA, which encodes a cell division inhibitor induced during replication fork arrest. Moreover, such characteristics of del1 MMS sensitivity were shared by restart mutant priA300, which encodes a helicase-deficient restart protein. Epistasis analysis indicated that del1 in combination with priA300 had no further effects on cellular recovery from MMS and UV treatment; however, the del2/3 mutation, which allows expression of only IF2-1, synergistically increased UV sensitivity in combination with priA300. The results indicate that full-length IF2, in a function distinct from truncated forms, influences the engagement or activity of restart functions dependent on PriA helicase, allowing cellular growth when a DNA–damaging agent is present

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

CORRIE: enzyme sequence annotation with confidence estimates

Author: A Bairoch
AM Leontovich
Benjamin Audit
CA Ouzounis
CA Wilson
CH Wu
Christos A Ouzounis
D Devos
EA Bayer
ED Levy
Emmanuel D Levy
F Abascal
FD Schubot
G Casari
H Weiss
JA Gerlt
JL Ong
Leon Goldovsky
M des Jardins
MA Andrade
NC Kyrpides
O Lichtarge
P Bork
PD Karp
SF Altschul
VJ Promponas
Wally R Gilks
WG Krebs
WR Gilks
Y Zhang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at:

HAL-ENS-LYON

Crossref

Springer - Publisher Connector

PubMed Central

IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses.

Author: Caffrey SM
Campbell BJ
Cavicchioli R
Chen F
Chen I-MA
Chu K
Denef V
Hallam SJ
Handley KM
Huang J
Huntemann M
Ivanova NN
K Reddy TB
Kyrpides NC
Liu W-T
Markowitz VM
McMahon K
Nielsen T
Paez-Espino D
Palaniappan K
Pavlopoulos GA
Pillay M
Pope PB
Ratner A
Rivers AR
Salekdeh GH
Setubal JC
Streit WR
Sullivan MB
Szeto E
Tsesmetzis N
Webster J
Publication venue: 'Oxford University Press (OUP)'
Publication date: 13/09/2022
Field of study

Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs are grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparing with external sequences, thus serving as an essential resource in the viral genomics community

OPUS - University of Technology Sydney

Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies

Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%–63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with “overprediction” of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Genomic Characterization of Methanomicrobiales Reveals Three Classes of Methanogens

BACKGROUND:Methanomicrobiales is the least studied order of methanogens. While these organisms appear to be more closely related to the Methanosarcinales in ribosomal-based phylogenetic analyses, they are metabolically more similar to Class I methanogens. METHODOLOGY/PRINCIPAL FINDINGS:In order to improve our understanding of this lineage, we have completely sequenced the genomes of two members of this order, Methanocorpusculum labreanum Z and Methanoculleus marisnigri JR1, and compared them with the genome of a third, Methanospirillum hungatei JF-1. Similar to Class I methanogens, Methanomicrobiales use a partial reductive citric acid cycle for 2-oxoglutarate biosynthesis, and they have the Eha energy-converting hydrogenase. In common with Methanosarcinales, Methanomicrobiales possess the Ech hydrogenase and at least some of them may couple formylmethanofuran formation and heterodisulfide reduction to transmembrane ion gradients. Uniquely, M. labreanum and M. hungatei contain hydrogenases similar to the Pyrococcus furiosus Mbh hydrogenase, and all three Methanomicrobiales have anti-sigma factor and anti-anti-sigma factor regulatory proteins not found in other methanogens. Phylogenetic analysis based on seven core proteins of methanogenesis and cofactor biosynthesis places the Methanomicrobiales equidistant from Class I methanogens and Methanosarcinales. CONCLUSIONS/SIGNIFICANCE:Our results indicate that Methanomicrobiales, rather than being similar to Class I methanogens or Methanomicrobiales, share some features of both and have some unique properties. We find that there are three distinct classes of methanogens: the Class I methanogens, the Methanomicrobiales (Class II), and the Methanosarcinales (Class III)

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

UNT Digital Library