9 research outputs found

    Data mining and predictive modeling of biomolecular network from biomedical literature databases

    Get PDF
    IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(2): pp. 251-263 .In this paper, we present a novel approach Bio-IEDM (Biomedical Information Extraction and Data Mining) to integrate text mining and predictive modeling to analyze biomolecular network from biomedical literature databases. Our method consists of two phases. In phase 1, we discuss a semisupervised efficient learning approach to automatically extract biological relationships such as protein-protein interaction, protein-gene interaction from the biomedical literature databases to construct the biomolecular network. Our method automatically learns the patterns based on a few user seed tuples and then extracts new tuples from the biomedical literature based on the discovered patterns. The derived biomolecular network forms a large scale-free network graph. In phase 2, we present a novel clustering algorithm to analyze the biomolecular network graph to identify biologically meaningful subnetworks (communities). The clustering algorithm considers the characteristics of the scale-free network graphs and is based on the local density of the vertex and its neighborhood functions that can be used to find more meaningful clusters with different density level. The experimental results indicate our approach is very effective in extracting biological knowledge from a huge collection of biomedical literature. The integration of data mining and information extraction provides a promising direction for analyzing the biomolecular network

    The Text-mining based PubChem Bioassay neighboring analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In recent years, the number of High Throughput Screening (HTS) assays deposited in PubChem has grown quickly. As a result, the volume of both the structured information (i.e. molecular structure, bioactivities) and the unstructured information (such as descriptions of bioassay experiments), has been increasing exponentially. As a result, it has become even more demanding and challenging to efficiently assemble the bioactivity data by mining the huge amount of information to identify and interpret the relationships among the diversified bioassay experiments. In this work, we propose a text-mining based approach for bioassay neighboring analysis from the unstructured text descriptions contained in the PubChem BioAssay database.</p> <p>Results</p> <p>The neighboring analysis is achieved by evaluating the cosine scores of each bioassay pair and fraction of overlaps among the human-curated neighbors. Our results from the cosine score distribution analysis and assay neighbor clustering analysis on all PubChem bioassays suggest that strong correlations among the bioassays can be identified from their conceptual relevance. A comparison with other existing assay neighboring methods suggests that the text-mining based bioassay neighboring approach provides meaningful linkages among the PubChem bioassays, and complements the existing methods by identifying additional relationships among the bioassay entries.</p> <p>Conclusions</p> <p>The text-mining based bioassay neighboring analysis is efficient for correlating bioassays and studying different aspects of a biological process, which are otherwise difficult to achieve by existing neighboring procedures due to the lack of specific annotations and structured information. It is suggested that the text-mining based bioassay neighboring analysis can be used as a standalone or as a complementary tool for the PubChem bioassay neighboring process to enable efficient integration of assay results and generate hypotheses for the discovery of bioactivities of the tested reagents.</p

    The structure and function of biological networks

    Get PDF
    Biology has been revolutionized in recent years by an explosion in the availability of data. Transforming this new wealth of data into meaningful biological insights and clinical breakthroughs requires a complete overhaul both in the questions being asked and the methodologies used to answer them. A major challenge in organizing and understanding the data is the ability to define the structure in biological systems, especially high level structures. Networks are a powerful and versatile tool useful in bridging the data and the complex biological systems. To address the importance of the higher-level modular and hierarchical structure in biological networks, we have investigated in this thesis the topological structure of protein-protein interaction networks through a comprehensive network analysis using statistical and computational techniques and publicly available protein-protein interaction data sets. Furthermore, we have designed and implemented a novel and efficient computational approach to identify modules from a seed protein. The experiment results demonstrate the efficiency and effectiveness of this approach in finding a module whose members exhibit high functional coherency. In addition, toward quantitative studies of protein translation regulatory networks (PTRN), we have developed a novel approach to reconstruct the PTRN through integration of protein-protein interaction data and Gene Ontology annotations. We have applied computational techniques based on hierarchical random graph model on these reconstructed PTRN to explore their modular and hierarchical and to detect missing and false positive links from these networks. The identification of the high order structures in these networks unveils insights into their functional organization.Ph.D., Information Science and Technology -- Drexel University, 201

    Data Mining and Predictive Modeling of Biomolecular Network from Biomedical Literature Databases

    No full text

    Data mining and predictive modeling of biomolecular network from biomedical literature databases

    No full text
    www.library.drexel.edu The following item is made available as a courtesy to scholars by the author(s) and Drexel University Library and may contain materials and content, including computer code and tags, artwork, text, graphics, images, and illustrations (Material) which may be protected by copyright law. Unless otherwise noted, the Material is made available for non profit and educational purposes, such as research, teaching and private study. For these limited purposes, you may reproduce (print, download or make copies) the Material without prior permission. All copies must include any copyright notice originally included with the Material. You must seek permission from the authors or copyright owners for all uses that are not allowed by fair use and other provisions of the U.S. Copyright Law. The responsibility for making an independent legal assessment and securing any necessary permission rests with persons desiring to reproduce or use the Material

    Systems Toxicology: Mining chemical-toxicity signaling paths to enable network medicine

    Get PDF
    Systems toxicology, a branch of toxicology that studies chemical effects on biological systems, presents exciting knowledge discovery challenges for the information researcher. The exponential increase in availability of genomic and proteomic data in this domain needs to be matched with increasingly sophisticated network analysis approaches. Improved ability to mine complex gene and protein interaction networks may eventually lead to discovery of drugs that target biological sub-networks (‘network medicine’) instead of individual proteins. In this thesis, we have proposed and investigated the use of a maximal edge centrality criterion to discover drug-toxicity signaling paths inside a human protein interaction network. The signaling path detection approach utilizes drug and toxicity information along with two novel edge weighting measures, one based on edge centrality for detected paths and another using differential gene expression between tissues treated with toxicity-inducing drugs and a control set. Drugs known to induce non-immune Neutropenia were analyzed as a test case and common path proteins on discovered signaling paths were evaluated for toxicological significance. In addition to investigating the value of topological connectivity for identification of toxicity biomarkers, the gene expression-based measure led to identification of a proposed biomarker panel for screening new drug candidates. Comparative evaluation of findings from the DTSP approach with standard microarray analysis method showed clear improvements in various performance measures including true positive rate, positive predictive value, negative predictive value and overall accuracy. Comparison of non-immune Neutropenia signaling paths with those discovered for a control set showed increased transcript-level activation of discovered signaling paths for toxicity-inducing drugs. We have demonstrated the scientific value from a systems-based approach for identifying toxicity-related proteins inside complex biological networks. The algorithm should be useful for biomarker identification for any toxicity assuming availability of relevant drug and drug-induced toxicity information.Ph.D., Information Studies -- Drexel University, 201

    Concepção, implementação e validação de um enfoque para integração e recuperação de conhecimento distribuído em bases de dados heterogêneas

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia e Gestão do Conhecimento, Florianópolis, 2010Com o crescimento da demanda e da composição de Bases de Conhecimento para os mais diversos fins e a sua disponibilização através da rede mundial de computadores, passou-se a observar a necessidade de organizar este conhecimento e também integrá-lo para possibilitar maior acessibilidade e facilidade na sua manutenção e utilização, devido à caracterização da disposição dispersa e o formato heterogêneo das referidas bases. Neste trabalho é proposto um sistema que efetua integração do conhecimento de bases de dados em contexto genérico, utilizando como estudo de caso o atendimento emergencial no CIT - Centro de Informações Toxicológicas de Santa Catarina - além de possibilitar a manutenção e manipulação deste artefato através do agrupamento de técnicas de recuperação de informação, aperfeiçoamento semântico, expansão de consulta, fonética em um único mecanismo. Foram avaliadas - através de uma revisão sistemática da literatura - as melhores opções disponibilizadas por estudos prévios em pesquisas realizadas nestas áreas a fim de encontrar a melhor combinação a ser utilizada no mecanismo, além da análise do produto final em um comparativo feito entre mecanismos previamente utilizados pelos profissionais no atendimento de urgência.With growth demand and composition of knowledge bases for different purposes and making them available through internet, it#s possible to see the need to organize this knowledge and also integrate it to provide greater accessibility and ease maintenance and use, due to the characterization of dispersed persistence and format of such heterogeneous databases. This dissertation proposes a system that performs integration of knowledge databases in generic context, using as a case study of emergency care at CIT - Toxicological Information Center of Santa Catarina - besides facilitating the maintenance and manipulation of the artifact by grouping techniques of information retrieval, semantic processing, query expansion, phonetics in a single mechanism. Were evaluated - through a systematic literature review - the best options available in previous studies on research conducted in these areas to find the best combination to be used in the mechanism, besides the analysis of the final product in a comparison made between mechanisms previously used by professionals in emergency care
    corecore