2,977 research outputs found

    TCMGeneDIT: a database for associated traditional Chinese medicine, gene and disease information using text mining

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Traditional Chinese Medicine (TCM), a complementary and alternative medical system in Western countries, has been used to treat various diseases over thousands of years in East Asian countries. In recent years, many herbal medicines were found to exhibit a variety of effects through regulating a wide range of gene expressions or protein activities. As available TCM data continue to accumulate rapidly, an urgent need for exploring these resources systematically is imperative, so as to effectively utilize the large volume of literature.</p> <p>Methods</p> <p>TCM, gene, disease, biological pathway and protein-protein interaction information were collected from public databases. For association discovery, the TCM names, gene names, disease names, TCM ingredients and effects were used to annotate the literature corpus obtained from PubMed. The concept to mine entity associations was based on hypothesis testing and collocation analysis. The annotated corpus was processed with natural language processing tools and rule-based approaches were applied to the sentences for extracting the relations between TCM effecters and effects.</p> <p>Results</p> <p>We developed a database, TCMGeneDIT, to provide association information about TCMs, genes, diseases, TCM effects and TCM ingredients mined from vast amount of biomedical literature. Integrated protein-protein interaction and biological pathways information are also available for exploring the regulations of genes associated with TCM curative effects. In addition, the transitive relationships among genes, TCMs and diseases could be inferred through the shared intermediates. Furthermore, TCMGeneDIT is useful in understanding the possible therapeutic mechanisms of TCMs via gene regulations and deducing synergistic or antagonistic contributions of the prescription components to the overall therapeutic effects. The database is now available at <url>http://tcm.lifescience.ntu.edu.tw/</url>.</p> <p>Conclusion</p> <p>TCMGeneDIT is a unique database that offers diverse association information on TCMs. This database integrates TCMs with biomedical studies that would facilitate clinical research and elucidate the possible therapeutic mechanisms of TCMs and gene regulations.</p

    The use of data-mining for the automatic formation of tactics

    Get PDF
    This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques

    Text mining of full-text journal articles combined with gene expression analysis reveals a relationship between sphingosine-1-phosphate and invasiveness of a glioblastoma cell line

    Get PDF
    BACKGROUND: Sphingosine 1-phosphate (S1P), a lysophospholipid, is involved in various cellular processes such as migration, proliferation, and survival. To date, the impact of S1P on human glioblastoma is not fully understood. Particularly, the concerted role played by matrix metalloproteinases (MMP) and S1P in aggressive tumor behavior and angiogenesis remains to be elucidated. RESULTS: To gain new insights in the effect of S1P on angiogenesis and invasion of this type of malignant tumor, we used microarrays to investigate the gene expression in glioblastoma as a response to S1P administration in vitro. We compared the expression profiles for the same cell lines under the influence of epidermal growth factor (EGF), an important growth factor. We found a set of 72 genes that are significantly differentially expressed as a unique response to S1P. Based on the result of mining full-text articles from 20 scientific journals in the field of cancer research published over a period of five years, we inferred gene-gene interaction networks for these 72 differentially expressed genes. Among the generated networks, we identified a particularly interesting one. It describes a cascading event, triggered by S1P, leading to the transactivation of MMP-9 via neuregulin-1 (NRG-1), vascular endothelial growth factor (VEGF), and the urokinase-type plasminogen activator (uPA). This interaction network has the potential to shed new light on our understanding of the role played by MMP-9 in invasive glioblastomas. CONCLUSION: Automated extraction of information from biological literature promises to play an increasingly important role in biological knowledge discovery. This is particularly true for high-throughput approaches, such as microarrays, and for combining and integrating data from different sources. Text mining may hold the key to unraveling previously unknown relationships between biological entities and could develop into an indispensable instrument in the process of formulating novel and potentially promising hypotheses

    Nominalization and Alternations in Biomedical Language

    Get PDF
    Background: This paper presents data on alternations in the argument structure of common domain-specific verbs and their associated verbal nominalizations in the PennBioIE corpus. Alternation is the term in theoretical linguistics for variations in the surface syntactic form of verbs, e.g. the different forms of stimulate in FSH stimulates follicular development and follicular development is stimulated by FSH. The data is used to assess the implications of alternations for biomedical text mining systems and to test the fit of the sublanguage model to biomedical texts. Methodology/Principal Findings: We examined 1,872 tokens of the ten most common domain-specific verbs or their zerorelated nouns in the PennBioIE corpus and labelled them for the presence or absence of three alternations. We then annotated the arguments of 746 tokens of the nominalizations related to these verbs and counted alternations related to the presence or absence of arguments and to the syntactic position of non-absent arguments. We found that alternations are quite common both for verbs and for nominalizations. We also found a previously undescribed alternation involving an adjectival present participle. Conclusions/Significance: We found that even in this semantically restricted domain, alternations are quite common, and alternations involving nominalizations are exceptionally diverse. Nonetheless, the sublanguage model applies to biomedica

    HyDRA Hybrid workflow Design Recommender Architecture

    Get PDF
    Workflows are a way to describe a series of computations on raw e-Science data. These data may be MRI brain scans, data from a high energy physics detector or metric data from an earth observation project. In order to derive meaningful knowledge from the data, it must be processed and analysed. Workflows have emerged as the principle mechanism for describing and enacting complex e-Science analyses on distributed infrastructures such as grids. Scientific users face a number of challenges when designing workflows. These challenges include selecting appropriate components for their tasks, spec- ifying dependencies between them and selecting appropriate parameter values. These tasks become especially challenging as workflows become increasingly large. For example, the CIVET workflow consists of up to 108 components. Building the workflow by hand and specifying all the links can become quite cumbersome for scientific users.Traditionally, recommender systems have been employed to assist users in such time-consuming and tedious tasks. One of the techniques used by recommender systems has been to predict what the user is attempting to do using a variety of techniques. These techniques include using workflow se- mantics on the one hand and historical usage patterns on the other. Semantics-based systems attempt to infer a user’s intentions based on the available semantics. Pattern-based systems attempt to extract usage patterns from previously-constructed workflows and match those patterns to the workflow un- der construction. The use of historical patterns adds dynamism to the suggestions as the system can learn and adapt with “experience”. However, in cases where there are no previous patterns to draw upon, pattern-based systems fail to perform. Semantics-based systems, on the other hand infer from static information, so they always have something to draw upon. However, that information first has to be encoded into the semantic repository for the system to draw upon it, which is a time-consuming and tedious task in it self. Moreover, semantics-based systems do not learn and adapt with experience. Both approaches have distinct, but complementary features and drawbacks. By combining the two approaches, the drawbacks of each approach can be addressed.This thesis presents HyDRA, a novel hybrid framework that combines frequent usage patterns and workflow semantics to generate suggestions. The functions performed by the framework include; a) extracting frequent functional usage patterns; b) identifying the semantics of unknown components; and c) generating accurate and meaningful suggestions. Challenges to mining frequent patterns in- clude ensuring that meaningful and useful patterns are extracted. For this purpose only patterns that occur above a minimum frequency threshold are mined. Moreover, instead of just groups of specific components, the pattern mining algorithm takes into account workflow component semantics. This allows the system to identify different types of components that perform a single composite function. One of the challenges in maintaining a semantic repository is to keep the repository up-to-date. This involves identifying new items and inferring their semantics. In this regard, a minor contribution of this research is a semantic inference engine that is responsible for function b). This engine also uses pre-defined workflow component semantics to infer new semantic properties and generate more accurate suggestions. The overall suggestion generation algorithm is also presented.HyDRA has been evaluated using workflows from the Laboratory of Neuro Imaging (LONI) repos- itory. These workflows have been chosen for their structural and functional characteristics that help� to evaluate the framework in different scenarios. The system is also compared with another existing pattern-based system to show a clear improvement in the accuracy of the suggestions generated

    Inferring Strategies for Sentence Ordering in Multidocument News Summarization

    Full text link
    The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the case for multidocument summarization where summary sentences may be drawn from different input articles. In this paper, we propose a methodology for studying the properties of ordering information in the news genre and describe experiments done on a corpus of multiple acceptable orderings we developed for the task. Based on these experiments, we implemented a strategy for ordering information that combines constraints from chronological order of events and topical relatedness. Evaluation of our augmented algorithm shows a significant improvement of the ordering over two baseline strategies

    A framework for structuring prerequisite relations between concepts in educational textbooks

    Get PDF
    In our age we are experiencing an increasing availability of digital educational resources and self-regulated learning. In this scenario, the development of automatic strategies for organizing the knowledge embodied in educational resources has a tremendous potential for building personalized learning paths and applications such as intelligent textbooks and recommender systems of learning materials. To this aim, a straightforward approach consists in enriching the educational materials with a concept graph, i.a. a knowledge structure where key concepts of the subject matter are represented as nodes and prerequisite dependencies among such concepts are also explicitly represented. This thesis focuses therefore on prerequisite relations in textbooks and it has two main research goals. The first goal is to define a methodology for systematically annotating prerequisite relations in textbooks, which is functional for analysing the prerequisite phenomenon and for evaluating and training automatic methods of extraction. The second goal concerns the automatic extraction of prerequisite relations from textbooks. These two research goals will guide towards the design of PRET, i.e. a comprehensive framework for supporting researchers involved in this research issue. The framework described in the present thesis allows indeed researchers to conduct the following tasks: 1) manual annotation of educational texts, in order to create datasets to be used for machine learning algorithms or for evaluation as gold standards; 2) annotation analysis, for investigating inter-annotator agreement, graph metrics and in-context linguistic features; 3) data visualization, for visually exploring datasets and gaining insights of the problem that may lead to improve algorithms; 4) automatic extraction of prerequisite relations. As for the automatic extraction, we developed a method that is based on burst analysis of concepts in the textbook and we used the gold dataset with PR annotation for its evaluation, comparing the method with other metrics for PR extraction
    corecore