Search CORE

32,794 research outputs found

Biodiversity informatics: the challenge of linking data and the role of shared identifiers

Author: Altschul
Dellavalle
Martin
Moreau
Ouellette
Page
Patterson
R. D. M. Page
Saux
Smith
Stein
Zamors'ky
Publication venue
Publication date: 01/01/2008
Field of study

A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers (such as DOIs and LSIDs), and the implementation of services that link those identifiers

Crossref

Enlighten

Nature Precedings

A large scale prediction of bacteriocin gene blocks suggests a wide functional spectrum for bacteriocins

Author: Freed Stefan D
Friedberg Iddo
Lee Shaun W
Morton James T
Publication venue
Publication date: 20/10/2015
Field of study

Bacteriocins are peptide-derived molecules produced by bacteria, whose recently-discovered functions include virulence factors and signalling molecules as well as their better known roles as antibiotics. To date, close to five hundred bacteriocins have been identified and classified. Recent discoveries have shown that bacteriocins are highly diverse and widely distributed among bacterial species. Given the heterogeneity of bacteriocin compounds, many tools struggle with identifying novel bacteriocins due to their vast sequence and structural diversity. Many bacteriocins undergo post-translational processing or modifications necessary for the biosynthesis of the final mature form. Enzymatic modification of bacteriocins as well as their export is achieved by proteins whose genes are often located in a discrete gene cluster proximal to the bacteriocin precursor gene, referred to as \textit{context genes} in this study. Although bacteriocins themselves are structurally diverse, context genes have been shown to be largely conserved across unrelated species. Using this knowledge, we set out to identify new candidates for context genes which may clarify how bacteriocins are synthesized, and identify new candidates for bacteriocins that bear no sequence similarity to known toxins. To achieve these goals, we have developed a software tool, Bacteriocin Operon and gene block Associator (BOA) that can identify homologous bacteriocin associated gene clusters and predict novel ones. We discover that several phyla have a strong preference for bactericon genes, suggesting distinct functions for this group of molecules. Availability: https://github.com/idoerg/BOAComment: Accepted for publication in BMC Bioinformatic

arXiv.org e-Print Archive

Springer - Publisher Connector

Extraction of Transcript Diversity from Scientific Literature

Author: Lars J Jensen
Parantu K Shah
Peer Bork
Philip Bourne
Stéphanie Boué
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

Transcript diversity generated by alternative splicing and associated mechanisms contributes heavily to the functional complexity of biological systems. The numerous examples of the mechanisms and functional implications of these events are scattered throughout the scientific literature. Thus, it is crucial to have a tool that can automatically extract the relevant facts and collect them in a knowledge base that can aid the interpretation of data from high-throughput methods. We have developed and applied a composite text-mining method for extracting information on transcript diversity from the entire MEDLINE database in order to create a database of genes with alternative transcripts. It contains information on tissue specificity, number of isoforms, causative mechanisms, functional implications, and experimental methods used for detection. We have mined this resource to identify 959 instances of tissue-specific splicing. Our results in combination with those from EST-based methods suggest that alternative splicing is the preferred mechanism for generating transcript diversity in the nervous system. We provide new annotations for 1,860 genes with the potential for generating transcript diversity. We assign the MeSH term “alternative splicing” to 1,536 additional abstracts in the MEDLINE database and suggest new MeSH terms for other events. We have successfully extracted information about transcript diversity and semiautomatically generated a database, LSAT, that can provide a quantitative understanding of the mechanisms behind tissue-specific gene expression. LSAT (Literature Support for Alternative Transcripts) is publicly available at http://www.bork.embl.de/LSAT/

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

FigShare

Aggregating, tagging and integrating biodiversity research

Author: BL Fisher
Brian L. Fisher
C Thomas
David P. Mindell
DP Faith
E Boakes
Georgina M. Mace
H Miller
J Walston
JA Johnson
Jonathan Eisen
Peter Roopnarine
RDM Page
Richard L. Pyle
Roderic D. M. Page
Sean A. Rands
SHM Butchart
T Clark
VS Chavan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

UCL Discovery

eScholarship - University of California

Enlighten

Transcriptome analysis of Taenia solium cysticerci using Open reading Frame ESTS (ORESTES)

Author: Almeida Carolina R.
Bayer-Santos Ethel
Davila Alberto M. R.
Dias-Neto Emmanuel
Ferreira Henrique B.
Grisard Edmundo C.
Maia Antônio A.
Ojopi Elida P. B.
Rodrigues Juliana B.
Rotava Gianinna
Sincero Thaís C. M.
Sperandio Maísa M.
Stoco Patricia H.
Tyler Kevin M.
Wagner Glauber
Zaha Arnaldo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Abstract Background Human infection by the pork tapeworm <it>Taenia solium </it>affects more than 50 million people worldwide, particularly in underdeveloped and developing countries. Cysticercosis which arises from larval encystation can be life threatening and difficult to treat. Here, we investigate for the first time the transcriptome of the clinically relevant cysticerci larval form. Results Using Expressed Sequence Tags (ESTs) produced by the ORESTES method, a total of 1,520 high quality ESTs were generated from 20 ORESTES cDNA mini-libraries and its analysis revealed fragments of genes with promising applications including 51 ESTs matching antigens previously described in other species, as well as 113 sequences representing proteins with potential extracellular localization, with obvious applications for immune-diagnosis or vaccine development. Conclusion The set of sequences described here will contribute to deciphering the expression profile of this important parasite and will be informative for the genome assembly and annotation, as well as for studies of intra- and inter-specific sequence variability. Genes of interest for developing new diagnostic and therapeutic tools are described and discussed.</p

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

University of East Anglia digital repository

Extracting, Transforming and Archiving Scientific Data

Author: Lemire Daniel
Vellino Andre
Publication venue
Publication date: 01/03/2011
Field of study

It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research.Comment: 8 pages, Fourth Workshop on Very Large Digital Libraries, 201

arXiv.org e-Print Archive

R-libre

The computer revolution in science: steps towards the realization of computer-supported discovery environments

Author: Jong Hidde de
Rip Arie
Publication venue: Elsevier
Publication date: 01/01/1997
Field of study

The tools that scientists use in their search processes together form so-called discovery environments. The promise of artificial intelligence and other branches of computer science is to radically transform conventional discovery environments by equipping scientists with a range of powerful computer tools including large-scale, shared knowledge bases and discovery programs. We will describe the future computer-supported discovery environments that may result, and illustrate by means of a realistic scenario how scientists come to new discoveries in these environments. In order to make the step from the current generation of discovery tools to computer-supported discovery environments like the one presented in the scenario, developers should realize that such environments are large-scale sociotechnical systems. They should not just focus on isolated computer programs, but also pay attention to the question how these programs will be used and maintained by scientists in research practices. In order to help developers of discovery programs in achieving the integration of their tools in discovery environments, we will formulate a set of guidelines that developers could follow

Elsevier - Publisher Connector

Crossref

University of Twente Research Information

A Molecular Biology Database Digest

Author: Bry François
Kröger Peer
Publication venue
Publication date: 01/01/2000
Field of study

Computational Biology or Bioinformatics has been defined as the application of mathematical and Computer Science methods to solving problems in Molecular Biology that require large scale data, computation, and analysis [18]. As expected, Molecular Biology databases play an essential role in Computational Biology research and development. This paper introduces into current Molecular Biology databases, stressing data modeling, data acquisition, data retrieval, and the integration of Molecular Biology data from different sources. This paper is primarily intended for an audience of computer scientists with a limited background in Biology

CiteSeerX

Open Access LMU