22 research outputs found

    Tagging Scientific Publications using Wikipedia and Natural Language Processing Tools. Comparison on the ArXiv Dataset

    Full text link
    In this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset consisting of abstracts from 0.7 million of scientific documents deposited in the ArXiv preprint collection. We believe that obtained tags can be later on applied as useful document features in various machine learning tasks (document similarity, clustering, topic modelling, etc.)

    Automatic term identification for bibliometric mapping

    Get PDF
    A term map is a map that visualizes the structure of a scientific field by showing the relations between important terms in the field. The terms shown in a term map are usually selected manually with the help of domain experts. Manual term selection has the disadvantages of being subjective and labor-intensive. To overcome these disadvantages, we propose a methodology for automatic term identification and we use this methodology to select the terms to be included in a term map. To evaluate the proposed methodology, we use it to construct a term map of the field of operations research. The quality of the map is assessed by a number of operations research experts. It turns out that in general the proposed methodology performs quite well

    Benchmarking Ontologies: Bigger or Better?

    Get PDF
    A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them

    KneeTex: an ontology–driven system for information extraction from MRI reports

    Get PDF
    Background. In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities. In this paper we describe KneeTex, an information extraction system that operates in this domain. Methods. As an ontology–driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain–specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico–semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co–reference resolution, followed by text segmentation. Ontology–based semantic typing is then used to drive the template filling process. Results. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine–grained lexico–semantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00%, recall of 97.63% and F–measure of 97.81%, the values of which are in line with human–like performance. Conclusions. KneeTex is an open–source, stand–alone application for information extraction from narrative reports that describe an MRI scan of the knee. Given an MRI report as input, the system outputs the corresponding clinical findings in the form of JavaScript Object Notation objects. The extracted information is mapped onto TRAK, an ontology that formally models knowledge relevant for the rehabilitation of knee conditions. As a result, formally structured and coded information allows for complex searches to be conducted efficiently over the original MRI reports, thereby effectively supporting epidemiologic studies of knee conditions

    A New Framework for MWE Acquisition

    No full text

    A Domain Independent Approach for Extracting Terms from Research Papers

    No full text

    Personalised Opinion-based Recommendation

    Get PDF
    24th International Conference, ICCBR 2016, Atlanta, Georgia, USA, 31 October - 02 November 2016E-commerce recommender systems seek out matches betweencustomers and items in order to help customers discover more relevantand satisfying products and to increase the conversion rate of browsers tobuyers. To do this, a recommender system must learn about the likes anddislikes of customers/users as well as the advantages and disadvantages(pros and cons) of products. Recently, the explosion of user-generatedcontent, especially customer reviews, and other forms of opinionated expression,has provided a new source of user and product insights. Theinterests of a user can be mined from the reviews that they write andthe pros and cons of products can be mined from the reviews writtenabout them. In this paper, we build on recent work in this area to generateuser and product proles from user-generated reviews. We furtherdescribe how this information can be used in various recommendationtasks to suggest high-quality and relevant items to users based on eitheran explicit query or their prole. We evaluate these ideas using alarge dataset of TripAdvisor reviews. The results show the benets ofcombining sentiment and similarity in both query-based and user-basedrecommendation scenarios, and also disclose the eect of the number ofreviews written by a user on recommendation performance.Science Foundation Irelan

    Great Explanations: Opinionated Explanations for Recommendation

    Get PDF
    Case-based Reasoning Research and Development: 23rd International Conference, ICCBR 2015, Frankfurt am Main, Germany 28-30 September 2015Explaining recommendations helps users to make better, more satisfying decisions. We describe a novel approach to explanation for recommender systems, one that drives the recommendation process, while at the same time providing the user with useful insights into the reason why items have been chosen and the trade-os they may need to consider when making their choice. We describe this approach in the context ofa case-based recommender system that harnesses opinions mined from user-generated reviews, and evaluate it on TripAdvisor Hotel data
    corecore