Search CORE

1,029 research outputs found

Draft genome sequences of 12 Shiga toxin-producing Escherichia coli strains isolated from dairy cattle in Portugal

Author: Arndt
Gould
Joensen
Joensen
Kearse
Khalil
Kruger
Tatusova
Publication venue: 'American Society for Microbiology'
Publication date: 17/09/2020
Field of study

Shiga toxin-producing Escherichia coli (STEC) is a foodborne pathogen transmitted from animals to humans through contaminated food. Cattle are the main reservoir of STEC, but their genetic diversity is still poorly characterized, especially regarding strains isolated in Portugal. We therefore present the draft genomic sequences of 12 STEC strains isolated from cattle in the north of Portugal.This study was supported by project PhageSTEC PTDC/CVT-CVT/29628/2017 (POCI-01-0145-FEDER-029628) fundedby FEDER through COMPETE2020 (Programa Operacional Competitividade e Internacionalização) and by National Funds through FCT (Fundação para a Ciência e a Tecnologia). This study was also supported by FCT grant POCI-01-0247-FEDER-033679 and strategic funding of unit UIDB/04469/2020 and theBioTecNorte operation (NORTE-01-0145-FEDER-000004), funded by the European Re-gional Development Fund under the scope of Norte2020–Programa Operacional Regional do Norte.info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

Splign: algorithms for computing spliced alignments with identification of paralogs

Author: Kapustin Yuri
Lipman David
Souvorov Alexander
Tatusova Tatiana
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

Author: Maglott Donna R.
Pruitt Kim D.
Tatusova Tatiana
Publication venue: Oxford University Press
Publication date: 27/11/2006
Field of study

NCBI's reference sequence (RefSeq) database () is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. The database includes 3774 organisms spanning prokaryotes, eukaryotes and viruses, and has records for 2 879 860 proteins (RefSeq release 19). RefSeq records integrate information from multiple sources, when additional data are available from those sources and therefore represent a current description of the sequence and its features. Annotations include coding regions, conserved domains, tRNAs, sequence tagged sites (STS), variation, references, gene and protein product names, and database cross-references. Sequence is reviewed and features are added using a combined approach of collaboration and other input from the scientific community, prediction, propagation from GenBank and curation by NCBI staff. The format of all RefSeq records is validated, and an increasing number of tests are being applied to evaluate the quality of sequence and annotation, especially in the context of complete genomic sequence

Crossref

PubMed Central

Entrez Gene: gene-centered information at NCBI

Author: Maglott Donna
Ostell Jim
Pruitt Kim D.
Tatusova Tatiana
Publication venue: Oxford University Press
Publication date: 05/12/2006
Field of study

Entrez Gene () is NCBI's database for gene-specific information. Entrez Gene includes records from genomes that have been completely sequenced, that have an active research community to contribute gene-specific information or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of both curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases and from other databases within NCBI. Records in Entrez Gene are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is provided via interactive browsing through NCBI's Entrez system, via NCBI's Entrez programing utilities (E-Utilities), and for bulk transfer by ftp

CiteSeerX

Crossref

PubMed Central

Functional Annotations of Paralogs: A Blessing and a Curse

Author: Anderson
Cotton
Giribet
Inoue
Koonin
Lan
Lawrence
Murali
Overbeek
Tatusova
Publication venue: 'MDPI AG'
Publication date: 01/01/2016
Field of study

Gene duplication followed by mutation is a classic mechanism of neofunctionalization, producing gene families with functional diversity. In some cases, a single point mutation is sufficient to change the substrate specificity and/or the chemistry performed by an enzyme, making it difficult to accurately separate enzymes with identical functions from homologs with different functions. Because sequence similarity is often used as a basis for assigning functional annotations to genes, non-isofunctional gene families pose a great challenge for genome annotation pipelines. Here we describe how integrating evolutionary and functional information such as genome context, phylogeny, metabolic reconstruction and signature motifs may be required to correctly annotate multifunctional families. These integrative analyses can also lead to the discovery of novel gene functions, as hints from specific subgroups can guide the functional characterization of other members of the family. We demonstrate how careful manual curation processes using comparative genomics can disambiguate subgroups within large multifunctional families and discover their functions. We present the COG0720 protein family as a case study. We also discuss strategies to automate this process to improve the accuracy of genome functional annotation pipelines

Crossref

Directory of Open Access Journals

Cronfa at Swansea University

Entrez Gene: gene-centered information at NCBI

Author: Maglott Donna
Ostell Jim
Pruitt Kim D.
Tatusova Tatiana
Publication venue: Oxford University Press
Publication date
Field of study

Entrez Gene (http://www.ncbi.nlm.nih.gov/gene) is National Center for Biotechnology Information (NCBI)’s database for gene-specific information. Entrez Gene maintains records from genomes which have been completely sequenced, which have an active research community to submit gene-specific information, or which are scheduled for intense sequence analysis. The content represents the integration of curation and automated processing from NCBI’s Reference Sequence project (RefSeq), collaborating model organism databases, consortia such as Gene Ontology and other databases within NCBI. Records in Entrez Gene are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, genomic location, gene products and their attributes, markers, phenotypes and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is available via interactive browsing through NCBI’s Entrez system, via NCBI’s Entrez programming utilities (E-Utilities) and for bulk transfer by FTP

Crossref

PubMed Central

FLAN: a web server for influenza virus genome annotation

Author: Bao Yiming
Bolotov Pavel
Dernovoy Dmitry
Kiryutin Boris
Tatusova Tatiana
Publication venue: Oxford University Press
Publication date
Field of study

FLAN (short for FLu ANnotation), the NCBI web server for genome annotation of influenza virus (http://www.ncbi.nlm.nih.gov/genomes/FLU/Database/annotation.cgi) is a tool for user-provided influenza A virus or influenza B virus sequences. It can validate and predict protein sequences encoded by an input flu sequence. The input sequence is BLASTed against a database containing influenza sequences to determine the virus type (A or B), segment (1 through 8) and subtype for the hemagglutinin and neuraminidase segments of influenza A virus. For each segment/subtype of the viruses, a set of sample protein sequences is maintained. The input sequence is then aligned against the corresponding protein set with a ‘Protein to nucleotide alignment tool’ (ProSplign). The translated product from the best alignment to the sample protein sequence is used as the predicted protein encoded by the input sequence. The output can be a feature table that can be used for sequence submission to GenBank (by Sequin or tbl2asn), a GenBank flat file, or the predicted protein sequences in FASTA format. A message showing the length of the input sequence, the predicted virus type, segment and subtype for the hemagglutinin and neuraminidase segments of Influenza A virus will also be displayed

Crossref

PubMed Central

Dealing with the Data Deluge – New Strategies in Prokaryotic Genome Analysis

Author: Ciufo Stacy
Fedorov Boris
Kiryutin Boris
Tatusova Tatiana
Tolstoy Igor
Zaslavsky Leonid
Publication venue: 'IntechOpen'
Publication date: 14/01/2016
Field of study

Recent technological innovations have ignited an explosion in microbial genome sequencing that has fundamentally changed our understanding of biology of microbes and profoundly impacted public health policy. This huge increase in DNA sequence data presents new challenges for the annotation, analysis, and visualization bioinformatics tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data. Genomes are organized in a hierarchical distance tree using single-copy ribosomal protein marker distances for distance calculation. Protein distance measures dissimilarity between markers of the same type and the subsequent genomic distance averages over the majority of marker-distances, ignoring the outliers. More than 30,000 genomes from public archives have been organized in a marker distance tree resulting in 6,438 species-level clades representing 7,597 taxonomic species. This computational infrastructure provides a foundation for prokaryotic gene and genome analysis, allowing easy access to pre-calculated genome groups at various distance levels. One of the most challenging problems in the current data deluge is the presentation of the relevant data at an appropriate resolution for each application, eliminating data redundancy but keeping biologically interesting variations

IntechOpen

Virus variation resources at the National Center for Biotechnology Information: dengue virus

Author: Bao Yiming
Kiryutin Boris
Resch Wolfgang
Rozanov Michael
Tatusova Tatiana A
Zaslavsky Leonid
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background There is an increasing number of complete and incomplete virus genome sequences available in public databases. This large body of sequence data harbors information about epidemiology, phylogeny, and virulence. Several specialized databases, such as the NCBI Influenza Virus Resource or the Los Alamos HIV database, offer sophisticated query interfaces along with integrated exploratory data analysis tools for individual virus species to facilitate extracting this information. Thus far, there has not been a comprehensive database for dengue virus, a significant public health threat. Results We have created an integrated web resource for dengue virus. The technology developed for the NCBI Influenza Virus Resource has been extended to process non-segmented dengue virus genomes. In order to allow efficient processing of the dengue genome, which is large in comparison with individual influenza segments, we developed an offline pre-alignment procedure which generates a multiple sequence alignment of all dengue sequences. The pre-calculated alignment is then used to rapidly create alignments of sequence subsets in response to user queries. This improvement in technology will also facilitate the incorporation of additional virus species in the future. The set of virus-specific databases at NCBI, which will be referred to as Virus Variation Resources (VVR), allow users to build complex queries against virus-specific databases and then apply exploratory data analysis tools to the results. The metadata is automatically collected where possible, and extended with data extracted from the literature. Conclusion The NCBI Dengue Virus Resource integrates dengue sequence information with relevant metadata (sample collection time and location, disease severity, serotype, sequenced genome region) and facilitates retrieval and preliminary analysis of dengue sequences using integrated web analysis and visualization tools.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central