411,906 research outputs found
Evaluating the state-of-the-art in mapping research spaces: a Brazilian case study
Scientific knowledge cannot be seen as a set of isolated fields, but as a
highly connected network. Understanding how research areas are connected is of
paramount importance for adequately allocating funding and human resources
(e.g., assembling teams to tackle multidisciplinary problems). The relationship
between disciplines can be drawn from data on the trajectory of individual
scientists, as researchers often make contributions in a small set of
interrelated areas. Two recent works propose methods for creating research maps
from scientists' publication records: by using a frequentist approach to create
a transition probability matrix; and by learning embeddings (vector
representations). Surprisingly, these models were evaluated on different
datasets and have never been compared in the literature. In this work, we
compare both models in a systematic way, using a large dataset of publication
records from Brazilian researchers. We evaluate these models' ability to
predict whether a given entity (scientist, institution or region) will enter a
new field w.r.t. the area under the ROC curve. Moreover, we analyze how
sensitive each method is to the number of publications and the number of fields
associated to one entity. Last, we conduct a case study to showcase how these
models can be used to characterize science dynamics in the context of Brazil.Comment: 28 pages, 11 figure
A text-mining system for extracting metabolic reactions from full-text articles
Background: Increasingly biological text mining research is focusing on the extraction of complex relationships
relevant to the construction and curation of biological networks and pathways. However, one important category of
pathwayâmetabolic pathwaysâhas been largely neglected.
Here we present a relatively simple method for extracting metabolic reaction information from free text that scores
different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence
and location of stemmed keywords. This method extends an approach that has proved effective in the context of the
extraction of proteinâprotein interactions.
Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our
method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the
well-known protein-protein interaction extraction task.
Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been
assumed, and that (as in the case of proteinâprotein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed
- âŚ