401,959 research outputs found
The potential of text mining in data integration and network biology for plant research : a case study on Arabidopsis
Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies
Text Data Mining from the Author's Perspective: Whose Text, Whose Mining, and to Whose Benefit?
Given the many technical, social, and policy shifts in access to scholarly
content since the early days of text data mining, it is time to expand the
conversation about text data mining from concerns of the researcher wishing to
mine data to include concerns of researcher-authors about how their data are
mined, by whom, for what purposes, and to whose benefits.Comment: Forum Statement: Data Mining with Limited Access Text: National
Forum. April 5-6, 2018. https://publish.illinois.edu/limitedaccess-tdm
ChemTextMiner: An open source tool kit for mining medical literature abstracts
Text mining involves recognizing patterns from a wealth of information hidden latent in unstructured text and deducing explicit relationships among data entities by using data mining tools. Text mining of Biomedical literature is essential for building biological network connecting genes, proteins, drugs, therapeutic categories, side effects etc. related to diseases of interest. We present an approach for textmining biomedical literature mostly in terms of not so obvious hidden relationships and build biological network applied for the textmining of important human diseases like MTB, Malaria, Alzheimer and Diabetes. The methods, tools and data used for building biological networks using a distributed computing environment previously used for ChemXtreme[1] and ChemStar[2] applications are also described
Effective pattern discovery for text mining
Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance
Text and spatial data mining
Parcellation of the human brain Parcellation of the human brain by combining text mining and spatial data mining within a neuroinformatics database. Text mining: Analysis of scientific abstracts. Spatial data mining: Modeling of the distribution of Talairach coordinates. Seek communality between the the text representation and spatial representation by multivariate analysis
Text mining and dimension reduction method application into exploring isomorphic pressures in corporate communication on textual tweet data about sustainability in the energy sector
The study analyses the isomorphism pressures within the context of sustainability by exploring the Twitter communication in the energy sector. Recently, there can be observed the increasing focus on interactive and communicative construction of an institution to understand how the organizations sustain the institutional pressures. The rhetorical commitments that create narrative dynamics in organizational communication are central to institutional diffusion and change. Social Media, Twitter, in particular, has been demonstrated as the new opportunity to explore the linguistic dimension in corporate communications. We propose the use of Social Media linguistic data (tweets with their hashtags and keywords) and the triangulated method (text mining, web mining, and linguistic and content analysis) to examine the tweets´ trends in each company. Based on the institutional theory of organizational communication, the paper examines the relation between the idea of sustainability and isomorphism that leads to the adoption of similar models and attitudes among the organizations. It applies the text mining and correspondence methods within the R software. The energy sector tweets in English (from 2016) were treated by the text mining processes of the statistical linguistic analysis in the R tool. Text mining, involving the linguistic, statistical, and the machine learning techniques reveals and visualizes the latent structures of the content in an unstructured or weakly structured text data in a given collection of documents.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tec
Open Data Platform for Knowledge Access in Plant Health Domain : VESPA Mining
Important data are locked in ancient literature. It would be uneconomic to
produce these data again and today or to extract them without the help of text
mining technologies. Vespa is a text mining project whose aim is to extract
data on pest and crops interactions, to model and predict attacks on crops, and
to reduce the use of pesticides. A few attempts proposed an agricultural
information access. Another originality of our work is to parse documents with
a dependency of the document architecture
- …