3,392 research outputs found

    Editorial for the First Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics

    Full text link
    The workshop "Mining Scientific Papers: Computational Linguistics and Bibliometrics" (CLBib 2015), co-located with the 15th International Society of Scientometrics and Informetrics Conference (ISSI 2015), brought together researchers in Bibliometrics and Computational Linguistics in order to study the ways Bibliometrics can benefit from large-scale text analytics and sense mining of scientific papers, thus exploring the interdisciplinarity of Bibliometrics and Natural Language Processing (NLP). The goals of the workshop were to answer questions like: How can we enhance author network analysis and Bibliometrics using data obtained by text analytics? What insights can NLP provide on the structure of scientific writing, on citation networks, and on in-text citation analysis? This workshop is the first step to foster the reflection on the interdisciplinarity and the benefits that the two disciplines Bibliometrics and Natural Language Processing can drive from it.Comment: 4 pages, Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics at ISSI 201

    Attitudes expressed in online comments about environmental factors in the tourism sector: an exploratory study

    Get PDF
    The object of this exploratory study is to identify the positive, neutral and negative environment factors that affect users who visit Spanish hotels in order to help the hotel managers decide how to improve the quality of the services provided. To carry out the research a Sentiment Analysis was initially performed, grouping the sample of tweets (n = 14459) according to the feelings shown and then a textual analysis was used to identify the key environment factors in these feelings using the qualitative analysis software Nvivo (QSR International, Melbourne, Australia). The results of the exploratory study present the key environment factors that affect the users experience when visiting hotels in Spain, such as actions that support local traditions and products, the maintenance of rural areas respecting the local environment and nature, or respecting air quality in the areas where hotels have facilities and offer services. The conclusions of the research can help hotels improve their services and the impact on the environment, as well as improving the visitors experience based on the positive, neutral and negative environment factors which the visitors themselves identified

    Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017

    Full text link
    Adverse drug reactions (ADRs) are unwanted or harmful effects experienced after the administration of a certain drug or a combination of drugs, presenting a challenge for drug development and drug administration. In this paper, we present a set of taggers for extracting adverse drug reactions and related entities, including factors, severity, negations, drug class and animal. The systems used a mix of rule-based, machine learning (CRF) and deep learning (BLSTM with word2vec embeddings) methodologies in order to annotate the data. The systems were submitted to adverse drug reaction shared task, organised during Text Analytics Conference in 2017 by National Institute for Standards and Technology, archiving F1-scores of 76.00 and 75.61 respectively.Comment: Paper describing submission for TAC ADR shared tas

    Improving average ranking precision in user searches for biomedical research datasets

    Full text link
    Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorisation method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries. Our system provides competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP among the participants, being +22.3% higher than the median infAP of the participant's best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system's performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. Our similarity measure algorithm seems to be robust, in particular compared to Divergence From Randomness framework, having smaller performance variations under different training conditions. Finally, the result categorization did not have significant impact on the system's performance. We believe that our solution could be used to enhance biomedical dataset management systems. In particular, the use of data driven query expansion methods could be an alternative to the complexity of biomedical terminologies

    Profiling risk factors for chronic uveitis in juvenile idiopathic arthritis: a new model for EHR-based research.

    Get PDF
    BackgroundJuvenile idiopathic arthritis is the most common rheumatic disease in children. Chronic uveitis is a common and serious comorbid condition of juvenile idiopathic arthritis, with insidious presentation and potential to cause blindness. Knowledge of clinical associations will improve risk stratification. Based on clinical observation, we hypothesized that allergic conditions are associated with chronic uveitis in juvenile idiopathic arthritis patients.MethodsThis study is a retrospective cohort study using Stanford's clinical data warehouse containing data from Lucile Packard Children's Hospital from 2000-2011 to analyze patient characteristics associated with chronic uveitis in a large juvenile idiopathic arthritis cohort. Clinical notes in patients under 16 years of age were processed via a validated text analytics pipeline. Bivariate-associated variables were used in a multivariate logistic regression adjusted for age, gender, and race. Previously reported associations were evaluated to validate our methods. The main outcome measure was presence of terms indicating allergy or allergy medications use overrepresented in juvenile idiopathic arthritis patients with chronic uveitis. Residual text features were then used in unsupervised hierarchical clustering to compare clinical text similarity between patients with and without uveitis.ResultsPreviously reported associations with uveitis in juvenile idiopathic arthritis patients (earlier age at arthritis diagnosis, oligoarticular-onset disease, antinuclear antibody status, history of psoriasis) were reproduced in our study. Use of allergy medications and terms describing allergic conditions were independently associated with chronic uveitis. The association with allergy drugs when adjusted for known associations remained significant (OR 2.54, 95% CI 1.22-5.4).ConclusionsThis study shows the potential of using a validated text analytics pipeline on clinical data warehouses to examine practice-based evidence for evaluating hypotheses formed during patient care. Our study reproduces four known associations with uveitis development in juvenile idiopathic arthritis patients, and reports a new association between allergic conditions and chronic uveitis in juvenile idiopathic arthritis patients
    corecore