401,959 research outputs found

    The potential of text mining in data integration and network biology for plant research : a case study on Arabidopsis

    Get PDF
    Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies

    Text Data Mining from the Author's Perspective: Whose Text, Whose Mining, and to Whose Benefit?

    Full text link
    Given the many technical, social, and policy shifts in access to scholarly content since the early days of text data mining, it is time to expand the conversation about text data mining from concerns of the researcher wishing to mine data to include concerns of researcher-authors about how their data are mined, by whom, for what purposes, and to whose benefits.Comment: Forum Statement: Data Mining with Limited Access Text: National Forum. April 5-6, 2018. https://publish.illinois.edu/limitedaccess-tdm

    ChemTextMiner: An open source tool kit for mining medical literature abstracts

    Get PDF
    Text mining involves recognizing patterns from a wealth of information hidden latent in unstructured text and deducing explicit relationships among data entities by using data mining tools. Text mining of Biomedical literature is essential for building biological network connecting genes, proteins, drugs, therapeutic categories, side effects etc. related to diseases of interest. We present an approach for textmining biomedical literature mostly in terms of not so obvious hidden relationships and build biological network applied for the textmining of important human diseases like MTB, Malaria, Alzheimer and Diabetes. The methods, tools and data used for building biological networks using a distributed computing environment previously used for ChemXtreme[1] and ChemStar[2] applications are also described

    Effective pattern discovery for text mining

    Get PDF
    Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance

    Text and spatial data mining

    Get PDF
    Parcellation of the human brain Parcellation of the human brain by combining text mining and spatial data mining within a neuroinformatics database. Text mining: Analysis of scientific abstracts. Spatial data mining: Modeling of the distribution of Talairach coordinates. Seek communality between the the text representation and spatial representation by multivariate analysis

    Text mining and dimension reduction method application into exploring isomorphic pressures in corporate communication on textual tweet data about sustainability in the energy sector

    Get PDF
    The study analyses the isomorphism pressures within the context of sustainability by exploring the Twitter communication in the energy sector. Recently, there can be observed the increasing focus on interactive and communicative construction of an institution to understand how the organizations sustain the institutional pressures. The rhetorical commitments that create narrative dynamics in organizational communication are central to institutional diffusion and change. Social Media, Twitter, in particular, has been demonstrated as the new opportunity to explore the linguistic dimension in corporate communications. We propose the use of Social Media linguistic data (tweets with their hashtags and keywords) and the triangulated method (text mining, web mining, and linguistic and content analysis) to examine the tweets´ trends in each company. Based on the institutional theory of organizational communication, the paper examines the relation between the idea of sustainability and isomorphism that leads to the adoption of similar models and attitudes among the organizations. It applies the text mining and correspondence methods within the R software. The energy sector tweets in English (from 2016) were treated by the text mining processes of the statistical linguistic analysis in the R tool. Text mining, involving the linguistic, statistical, and the machine learning techniques reveals and visualizes the latent structures of the content in an unstructured or weakly structured text data in a given collection of documents.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

    Open Data Platform for Knowledge Access in Plant Health Domain : VESPA Mining

    Get PDF
    Important data are locked in ancient literature. It would be uneconomic to produce these data again and today or to extract them without the help of text mining technologies. Vespa is a text mining project whose aim is to extract data on pest and crops interactions, to model and predict attacks on crops, and to reduce the use of pesticides. A few attempts proposed an agricultural information access. Another originality of our work is to parse documents with a dependency of the document architecture
    • …
    corecore