9 research outputs found
Discovery of protein-protein interactions using a combination of linguistic, statistical and graphical information
BACKGROUND: The rapid publication of important research in the biomedical literature makes it increasingly difficult for researchers to keep current with significant work in their area of interest. RESULTS: This paper reports a scalable method for the discovery of protein-protein interactions in Medline abstracts, using a combination of text analytics, statistical and graphical analysis, and a set of easily implemented rules. Applying these techniques to 12,300 abstracts, a precision of 0.61 and a recall of 0.97 were obtained, (f = 0.74) and when allowing for two-hop and three-hop relations discovered by graphical analysis, the precision was 0.74 (f = 0.83). CONCLUSION: This combination of linguistic and statistical approaches appears to provide the highest precision and recall thus far reported in detecting protein-protein relations using text analytic approaches
PPLook: an automated data mining tool for protein-protein interaction
<p>Abstract</p> <p>Background</p> <p>Extracting and visualizing of protein-protein interaction (PPI) from text literatures are a meaningful topic in protein science. It assists the identification of interactions among proteins. There is a lack of tools to extract PPI, visualize and classify the results.</p> <p>Results</p> <p>We developed a PPI search system, termed PPLook, which automatically extracts and visualizes protein-protein interaction (PPI) from text. Given a query protein name, PPLook can search a dataset for other proteins interacting with it by using a keywords dictionary pattern-matching algorithm, and display the topological parameters, such as the number of nodes, edges, and connected components. The visualization component of PPLook enables us to view the interaction relationship among the proteins in a three-dimensional space based on the OpenGL graphics interface technology. PPLook can also provide the functions of selecting protein semantic class, counting the number of semantic class proteins which interact with query protein, counting the literature number of articles appearing the interaction relationship about the query protein. Moreover, PPLook provides heterogeneous search and a user-friendly graphical interface.</p> <p>Conclusions</p> <p>PPLook is an effective tool for biologists and biosystem developers who need to access PPI information from the literature. PPLook is freely available for non-commercial users at <url>http://meta.usc.edu/softs/PPLook</url>.</p
An Integrated Web-based System for MEDLINE Analysis: A Case Study of Chronic Kidney Disease
In the era of big data, medical researchers attempt to utilize some analysis techniques like machine learning and text mining on their large-scale corpora to save valuable labor work and time. Consequently, many data analysis platforms are built to support medical professionals such as Pubtator, GeneWays, BioContext, etc. These platforms are helpful to medical entities recognition and relation extraction, but there is not an integrated platform to support researchers’ various needs, and medical projects are isolated from each other, which is hard to be shared and reused. As a result, we present an integrated system containing ‘name entity recognition’, ‘document categorization’ and ‘association extraction’. Besides, we add the concept of ‘socialization’ making projects reusable for further analyses. A case study of chronic kidney disease was adopted to indicate the effectiveness of the proposed system
RetroMine, or how to provide in-depth retrospective studies from Medline in a glance: the hepcidin use-case
International audienceThe rapid expansion of biomedical literature has provoked an increased development of advanced text mining tools to rapidly extract relevant events from the continuously increasing amount of knowledge published periodically in PubMed. However, bioinvestigators are still reluctant to use these tools for two reasons: i) a large volume of events is often extracted upon a query, and this volume is hard to manage, and ii) background events dominate search results and overshadow more pertinent published information, especially for domain experts. In this paper, we propose an approach that incorporates the temporal dimension of published events to the process of information extraction to improve data selection and prioritize more pertinent periodically published knowledge for scientists. Indeed, instead of providing the total knowledge associated with a PubMed query, which is usually a mix of trivial background information and non-background information, we propose a method that incorporates time and selects non background and highly relevant biological entities and events published over time for bioinvestigators. Before excluding background events from the total knowledge extracted, a quantification of their amount is also provided. This work is illustrated by a case study regarding Hepcidin gene publications over a decade, a duration that is sufficiently long enough to generate alternative views on the overall data extracted
ГЕННЫЕ СЕТИ
Исследования последнего десятилетия свидетельствуют о том, что подавляющее большинство фенотипических признаков человека, животных, растений и микроорганизмов (молекулярных, биохимических, клеточных, физиологических, морфологических, поведенческих и т. д.) контролируются очень сложным образом и что в основе их формирования лежат генные сети, т. е. группы координированно функционирующих генов, взаимодействующих друг с другом как через свои первичные продукты (РНК и белки), так и через разнообразные метаболиты и другие вторичные продукты функционирования генных сетей
Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers
Thomas PE, Klinger R, Furlong LI, Hofmann-Apitius M, Friedrich CM. Challenges in the association of human single nucleotide polymorphism mentions with unique database identifiers. BMC Bioinformatics. 2011;12(Suppl 4): S4
Recommended from our members
Elicitation of Protein-Protein Interactions from Biomedical Literature Using Association Rule Discovery
Extracting information from a stack of data is a tedious task and the scenario is no different in proteomics. Volumes of research papers are published about study of various proteins in several species, their interactions with other proteins and identification of protein(s) as possible biomarker in causing diseases. It is a challenging task for biologists to keep track of these developments manually by reading through the literatures. Several tools have been developed by computer linguists to assist identification, extraction and hypotheses generation of proteins and protein-protein interactions from biomedical publications and protein databases. However, they are confronted with the challenges of term variation, term ambiguity, access only to abstracts and inconsistencies in time-consuming manual curation of protein and protein-protein interaction repositories. This work attempts to attenuate the challenges by extracting protein-protein interactions in humans and elicit possible interactions using associative rule mining on full text, abstracts and captions from figures available from publicly available biomedical literature databases. Two such databases are used in our study: Directory of Open Access Journals (DOAJ) and PubMed Central (PMC). A corpus is built using articles based on search terms. A dataset of more than 38,000 protein-protein interactions from the Human Protein Reference Database (HPRD) is cross-referenced to validate discovered interactive pairs. A set of an optimal size of possible binary protein-protein interactions is generated to be made available for clinician or biological validation. A significant change in the number of new associations was found by altering the thresholds for support and confidence metrics. This study narrows down the limitations for biologists in keeping pace with discovery of protein-protein interactions via manually reading the literature and their needs to validate each and every possible interaction
Integration of Text Mining with Systems Biology Provides New Insight into the Pathogenesis of Diabetic Neuropathy.
Diabetic neuropathy (DN) is the most common complication of diabetes affecting approximately 60% of all diabetic patients leading to significant mortality, morbidity, and poor quality of life. Though more than 50% of patients with DN develop substantial nerve damage prior to noticeable symptoms, no biomarkers for predicting the onset or progression of DN are currently available. Here we present a biomarker discovery platform integrating literature mining and a systems biology approach to identify potential DN biomarkers. A web-based target identification and functional analysis tool, SciMiner (http://jdrf.neurology.med.umich.edu/SciMiner), was developed that identifies targets using a context specific analysis of MEDLINE abstracts and full texts. A comprehensive list of 1,026 targets from diabetes and reactive oxygen species (ROS) related literature was compiled by SciMiner. The expression levels of nine genes, selected from the over-represented ROS-diabetes targets, were measured in the dorsal root ganglia (DRG) of diabetic and non-diabetic DBA/2J mice. Eight genes exhibited significant differential expression and the directions of expression change in six of those genes paralleled enhanced oxidative stress in the DRG, suggesting the involvement of ROS related targets in DN. A microarray analysis was also performed on sural nerve biopsies from two DN patient groups with fast or slow DN progression to identify gene expression profiles related to DN progression. In the fast progressing DN, defense response and inflammatory response related genes were up-regulated, while lipid metabolic process and peroxisome proliferator-activated receptor (PPAR) signaling pathway related genes were down-regulated. We also developed mRNA expression signatures that predict DN progression in humans with a high prediction accuracy. Ridge-regression based predictive models with 14 genes achieved a prediction accuracy of 92% (correct prediction of 11 out of 12 patients). Our results identifying the unique gene signatures of progressive DN and compiling ROS-diabetes targets can facilitate the development of new mechanism-based therapies and predictive biomarkers of DN.Ph.D.BioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/77941/1/juhur_1.pd