80,234 research outputs found
A Linear Classifier Based on Entity Recognition Tools and a Statistical Approach to Method Extraction in the Protein-Protein Interaction Literature
We participated, in the Article Classification and the Interaction Method
subtasks (ACT and IMT, respectively) of the Protein-Protein Interaction task of
the BioCreative III Challenge. For the ACT, we pursued an extensive testing of
available Named Entity Recognition and dictionary tools, and used the most
promising ones to extend our Variable Trigonometric Threshold linear
classifier. For the IMT, we experimented with a primarily statistical approach,
as opposed to employing a deeper natural language processing strategy. Finally,
we also studied the benefits of integrating the method extraction approach that
we have used for the IMT into the ACT pipeline. For the ACT, our linear article
classifier leads to a ranking and classification performance significantly
higher than all the reported submissions. For the IMT, our results are
comparable to those of other systems, which took very different approaches. For
the ACT, we show that the use of named entity recognition tools leads to a
substantial improvement in the ranking and classification of articles relevant
to protein-protein interaction. Thus, we show that our substantially expanded
linear classifier is a very competitive classifier in this domain. Moreover,
this classifier produces interpretable surfaces that can be understood as
"rules" for human understanding of the classification. In terms of the IMT
task, in contrast to other participants, our approach focused on identifying
sentences that are likely to bear evidence for the application of a PPI
detection method, rather than on classifying a document as relevant to a
method. As BioCreative III did not perform an evaluation of the evidence
provided by the system, we have conducted a separate assessment; the evaluators
agree that our tool is indeed effective in detecting relevant evidence for PPI
detection methods.Comment: BMC Bioinformatics. In Pres
Building a semantically annotated corpus of clinical texts
In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains
Learning to Rank Academic Experts in the DBLP Dataset
Expert finding is an information retrieval task that is concerned with the
search for the most knowledgeable people with respect to a specific topic, and
the search is based on documents that describe people's activities. The task
involves taking a user query as input and returning a list of people who are
sorted by their level of expertise with respect to the user query. Despite
recent interest in the area, the current state-of-the-art techniques lack in
principled approaches for optimally combining different sources of evidence.
This article proposes two frameworks for combining multiple estimators of
expertise. These estimators are derived from textual contents, from
graph-structure of the citation patterns for the community of experts, and from
profile information about the experts. More specifically, this article explores
the use of supervised learning to rank methods, as well as rank aggregation
approaches, for combing all of the estimators of expertise. Several supervised
learning algorithms, which are representative of the pointwise, pairwise and
listwise approaches, were tested, and various state-of-the-art data fusion
techniques were also explored for the rank aggregation framework. Experiments
that were performed on a dataset of academic publications from the Computer
Science domain attest the adequacy of the proposed approaches.Comment: Expert Systems, 2013. arXiv admin note: text overlap with
arXiv:1302.041
- …