Search CORE

11 research outputs found

The Text-mining based PubChem Bioassay neighboring analysis

Author: Bryant Steve H
Han Lianyi
Suzek Tugba O
Wang Yanli
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In recent years, the number of High Throughput Screening (HTS) assays deposited in PubChem has grown quickly. As a result, the volume of both the structured information (i.e. molecular structure, bioactivities) and the unstructured information (such as descriptions of bioassay experiments), has been increasing exponentially. As a result, it has become even more demanding and challenging to efficiently assemble the bioactivity data by mining the huge amount of information to identify and interpret the relationships among the diversified bioassay experiments. In this work, we propose a text-mining based approach for bioassay neighboring analysis from the unstructured text descriptions contained in the PubChem BioAssay database. Results The neighboring analysis is achieved by evaluating the cosine scores of each bioassay pair and fraction of overlaps among the human-curated neighbors. Our results from the cosine score distribution analysis and assay neighbor clustering analysis on all PubChem bioassays suggest that strong correlations among the bioassays can be identified from their conceptual relevance. A comparison with other existing assay neighboring methods suggests that the text-mining based bioassay neighboring approach provides meaningful linkages among the PubChem bioassays, and complements the existing methods by identifying additional relationships among the bioassay entries. Conclusions The text-mining based bioassay neighboring analysis is efficient for correlating bioassays and studying different aspects of a biological process, which are otherwise difficult to achieve by existing neighboring procedures due to the lack of specific annotations and structured information. It is suggested that the text-mining based bioassay neighboring analysis can be used as a standalone or as a complementary tool for the PubChem bioassay neighboring process to enable efficient integration of assay results and generate hypotheses for the discovery of bioactivities of the tested reagents.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

NCBI GEO: mining millions of expression profiles—database and tools

Author: Barrett Tanya
Edgar Ron
Fujibuchi Wataru
Lash Alex E.
Ledoux Pierre
Ngau Wing-Chi
Rudnev Dmitry
Suzek Tugba O.
Troup Dennis B.
Wilhite Stephen E.
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest fully public repository for high-throughput molecular abundance data, primarily gene expression data. The database has a flexible and open design that allows the submission, storage and retrieval of many data types. These data include microarray-based experiments measuring the abundance of mRNA, genomic DNA and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. GEO currently holds over 30 000 submissions representing approximately half a billion individual molecular abundance measurements, for over 100 organisms. Here, we describe recent database developments that facilitate effective mining and visualization of these data. Features are provided to examine data from both experiment- and gene-centric perspectives using user-friendly Web-based interfaces accessible to those without computational or microarray-related analytical expertise. The GEO database is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo

CiteSeerX

Crossref

PubMed Central

An overview of the PubChem BioAssay resource

Author: Altschul
Altschul
Austin
Benjamin A. Shoemaker
Benson
Bolton
Driscoll
Evan Bolton
Harmar
Jewen Xiao
Jian Zhang
Jiyao Wang
Kanehisa
Karaman
Karen Karapetyan
Lazo
Liu
Marchler-Bauer
Marsden
Richard
Sayers
Seiler
Stephen H. Bryant
Svetlana Dracheva
Tugba O. Suzek
Wang
Wang
Wang
Yanli Wang
Zaharevitz
Zerhouni
Zerhouni
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

The PubChem BioAssay database (http://pubchem.ncbi.nlm.nih.gov) is a public repository for biological activities of small molecules and small interfering RNAs (siRNAs) hosted by the US National Institutes of Health (NIH). It archives experimental descriptions of assays and biological test results and makes the information freely accessible to the public. A PubChem BioAssay data entry includes an assay description, a summary and detailed test results. Each assay record is linked to the molecular target, whenever possible, and is cross-referenced to other National Center for Biotechnology Information (NCBI) database records. ‘Related BioAssays’ are identified by examining the assay target relationship and activity profile of commonly tested compounds. A key goal of PubChem BioAssay is to make the biological activity information easily accessible through the NCBI information retrieval system-Entrez, and various web-based PubChem services. An integrated suite of data analysis tools are available to optimize the utility of the chemical structure and biological activity information within PubChem, enabling researchers to aggregate, compare and analyze biological test results contributed by multiple organizations. In this work, we describe the PubChem BioAssay database, including data model, bioassay deposition and utilities that PubChem provides for searching, downloading and analyzing the biological activity information contained therein

CiteSeerX

Crossref

PubMed Central

Database resources of the National Center for Biotechnology Information

Author: Barrett Tanya
Benson Dennis A.
Bryant Stephen H.
Canese Kathi
Church Deanna M.
DiCuccio Michael
Edgar Ron
Federhen Scott
Helmberg Wolfgang
Kenton David L.
Khovayko Oleg
Lipman David J.
Madden Thomas L.
Maglott Donna R.
Ostell James
Pontius Joan U.
Pruitt Kim D.
Schriml Lynn M.
Schuler Gregory D.
Sequeira Edwin
Sherry Steven T.
Sirotkin Karl
Starchenko Grigory
Suzek Tugba O.
Tatusov Roman
Tatusova Tatiana A.
Wagner Lukas
Wheeler David L.
Yaschenko Eugene
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data retrieval systems and computational resources for the analysis of data in GenBank and other biological data made available through NCBI's website. NCBI resources include Entrez, Entrez Programming Utilities, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov

Crossref

PubMed Central

Database resources of the National Center for Biotechnology Information: update

Author: Church Deanna M.
Edgar Ron
Federhen Scott
Helmberg Wolfgang
Madden Thomas L.
Pontius Joan U.
Schriml Lynn M.
Schuler Gregory D.
Sequeira Edwin
Suzek Tugba O.
Tatusova Tatiana A.
Wagner Lukas
Wheeler David L.
Publication venue: Oxford University Press
Publication date: 01/01/2004
Field of study

In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI’s website. NCBI resources include Entrez, PubMed, PubMed Central, LocusLink, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SARS Coronavirus Resource, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov

Crossref

PubMed Central

Orthologous Groups (COGs) database, Retroviral

Author: David L. Wheeler
Deanna M. Church
Edwin Sequeira
Gregory D. Schuler
Joan U. Pontius
Lukas Wagner
Lynn M. Schriml
Pubmed Central
Ron Edgar
Scott Federhen
Tatiana A. Tatusova
Thomas L. Madden
Tugba O. Suzek
Wolfgang Helmberg
Publication venue
Publication date
Field of study

CiteSeerX