1,080 research outputs found

    Recognizing speculative language in biomedical research articles: a linguistically motivated perspective

    Get PDF
    We explore a linguistically motivated approach to the problem of recognizing speculative language (ā€œhedgingā€) in biomedical research articles. We describe a method, which draws on prior linguistic work as well as existing lexical resources and extends them by introducing syntactic patterns and a simple weighting scheme to estimate the speculation level of the sentences. We show that speculative language can be recognized successfully with such an approach, discuss some shortcomings of the method and point out future research possibilities.

    An NLP Analysis of Health Advice Giving in the Medical Research Literature

    Get PDF
    Health advice ā€“ clinical and policy recommendations ā€“ plays a vital role in guiding medical practices and public health policies. Whether or not authors should give health advice in medical research publications is a controversial issue. The proponents of actionable research advocate for the more efficient and effective transmission of science evidence into practice. The opponents are concerned about the quality of health advice in individual research papers, especially that in observational studies. Arguments both for and against giving advice in individual studies indicate a strong need for identifying and accessing health advice, for either practical use or quality evaluation purposes. However, current information services do not support the direct retrieval of health advice. Compared to other natural language processing (NLP) applications, health advice has not been computationally modeled as a language construct either. A new information service for directly accessing health advice should be able to reduce information barriers and to provide external assessment in science communication. This dissertation work built an annotated corpus of scientific claims that distinguishes health advice according to its occurrence and strength. The study developed NLP-based prediction models to identify health advice in the PubMed literature. Using the annotated corpus and prediction models, the study answered research questions regarding the practice of advice giving in medical research literature. To test and demonstrate the potential use of the prediction model, it was used to retrieve health advice regarding the use of hydroxychloroquine (HCQ) as a treatment for COVID-19 from LitCovid, a large COVID-19 research literature database curated by the National Institutes of Health. An evaluation of sentences extracted from both abstracts and discussions showed that BERT-based pre-trained language models performed well at detecting health advice. The health advice prediction model may be combined with existing health information service systems to provide more convenient navigation of a large volume of health literature. Findings from the study also show researchers are careful not to give advice solely in abstracts. They also tend to give weaker and non-specific advice in abstracts than in discussions. In addition, the study found that health advice has appeared consistently in the abstracts of observational studies over the past 25 years. In the sample, 41.2% of the studies offered health advice in their conclusions, which is lower than earlier estimations based on analyses of much smaller samples processed manually. In the abstracts of observational studies, journals with a lower impact are more likely to give health advice than those with a higher impact, suggesting the significance of the role of journals as gatekeepers of science communication. For the communities of natural language processing, information science, and public health, this work advances knowledge of the automated recognition of health advice in scientific literature. The corpus and code developed for the study have been made publicly available to facilitate future efforts in health advice retrieval and analysis. Furthermore, this study discusses the ways in which researchers give health advice in medical research articles, knowledge of which could be an essential step towards curbing potential exaggeration in the current global science communication. It also contributes to ongoing discussions of the integrity of scientific output. This study calls for caution in advice-giving in medical research literature, especially in abstracts alone. It also calls for open access to medical research publications, so that health researchers and practitioners can fully review the advice in scientific outputs and its implications. More evaluative strategies that can increase the overall quality of health advice in research articles are needed by journal editors and reviewers, given their gatekeeping role in science communication

    Measuring academic influence: Not all citations are equal

    Get PDF
    The importance of a research article is routinely measured by counting how many times it has been cited. However, treating all citations with equal weight ignores the wide variety of functions that citations perform. We want to automatically identify the subset of references in a bibliography that have a central academic influence on the citing paper. For this purpose, we examine the effectiveness of a variety of features for determining the academic influence of a citation. By asking authors to identify the key references in their own work, we created a data set in which citations were labeled according to their academic influence. Using automatic feature selection with supervised machine learning, we found a model for predicting academic influence that achieves good performance on this data set using only four features. The best features, among those we evaluated, were those based on the number of times a reference is mentioned in the body of a citing paper. The performance of these features inspired us to design an influence-primed h-index (the hip-index). Unlike the conventional h-index, it weights citations by how many times a reference is mentioned. According to our experiments, the hip-index is a better indicator of researcher performance than the conventional h-index

    Metadiscourse analysis of digital interpersonal interactions in academic settings in Turkey

    Get PDF
    Rapid technological advances, efficiency and easy access have firmly established emailing as a vital medium of communication in the last decades. Nowadays, all around the world, particularly in educational settings, the medium is one of the most widely used modes of interaction between students and university lecturers. Despite their important role in academic life, very little is known about the metadiscursive characteristics of these e-messages and as far as the author is aware there is no study that has examined metadiscourse in request emails in Turkish. This study aims to contribute to filling in this gap by focusing on the following two research questions: (i) How many and what type of interpersonal metadiscourse markers are used in request emails sent by students to their lecturers? (ii) Where are they placed and how are they combined with other elements in the text? In order to answer these questions a corpus of unsolicited request e-mails in Turkish was compiled. The data collection started in January 2010 and continued until March 2018. A total of 353 request emails sent from university students to their lecturers were collected. The data were first transcribed in CLAN CHILDES format and analysed using the interpersonal model. The metadiscourse categories that aimed to involve readers in the email were identified and classified. Next, their places in the text were determined and described in detail. Findings of the study show that request emails include a wide array of multifunctional interpersonal metadiscourse markers which are intricately combined and employed by the writers to reach their aims. The results also showed that there is a close relation between the ā€œweight of the requestā€ and number of the interpersonal metadiscourse markers in request mails

    Studies on Metadiscourse since the 3rd Millennium

    Get PDF
    Metadiscourse refer to linguistic resources that are used to refer to the act and the context of writing about some subject matter. Study of metadiscourse provides a gateway for understanding interactional features of texts or speech, looking beyond the ideational dimension of texts at how writers characterize the world and function interpersonally. The ability of writers to use metadiscourse effectively, to control the level of personality in their texts by offering a credible representation of themselves and their ideas, is seen as a defining feature of successful writing (Hyland, 2008). This paper provides a literature review of the theories in modelling metadiscourse and the studies investigating metadiscourse for the past 15 years, and propose future research directions based on the review. Keywords: Metadiscourse, interactional features, literature revie

    Scientific Knowledge Communication in Online Q&A Communities: Linguistic Devices as a Tool to Increase the Popularity and Perceived Professionalism of Knowledge Contribution

    Get PDF
    With the popularity of question-and-answer (Q&A) communities, widespread dissemination of scientific knowledge has become more viable than ever before. However, those contributing high-quality professional scientific knowledge are confronted with the challenge of making their contributions popular, since non expert readers may not recognize the importance of their contributions given the massive amount of information available online. In this study, we show that non expert readers are capable of evaluating the professionalism of content contributed in such communities as well as experts. However, we discovered that a salient discrepancy exists between the content non experts favor and the content they perceive as professional. In line with studies that have suggested that writing techniques play an important role in how expert content is received by lay persons, we investigated how the use of linguistic devices affects both the perceived professionalism and the popularity of contributions in Q&A communities. Based on both secondary data and a scenario-based survey, we identified specific linguistic devices that can increase content popularity without reducing perceived professionalism. Additionally, we revealed linguistic devices that increase popularity at the expense of perceived professionalism in this context. Finally, we conducted a laboratory experiment to more firmly establish the causal effects of the linguistic device use. The triangulated findings have important implications for both research and practice on communicating scientific knowledge in Q&A communitie

    Epistemička modalnost u akademskom diskursu u hrvatskom i engleskom jeziku

    Get PDF
    The present thesis is the result of a cross-cultural, genre-based study whose main objective is to examine how writers of research articles in psychology in Croatian and English use epistemic modality devices in hedging their claims or in evaluating other scholarsā€™ work. Based on the corpus of 60 research articles published in Croatian and English journals, the study aims to establish the patterns of similarities and differences in the use of the epistemic devices across the main rhetorical sections of a research article as well as to identify their major hedging functions. The overall results show that English writers use epistemic markers more frequently than their Croatian counterparts. This finding is generally in line with the previous cross-cultural studies, showing a more salient use of hedges and their more entrenched status in the Anglo- American writing as compared to academic writing in some other languages investigated. With respect to the individual categories of epistemic devices, the results show both similarities and differences in their uses across the two sub-corpora. In both the English and Croatian sub-corpus, epistemic modal verbs are employed most frequently, followed by epistemic verbs, while epistemic nouns are the least frequent category of epistemic devices. The major difference in the overall results concerns the distributional patterns in the use of epistemic devices. While epistemic modal verbs show a strikingly high frequency of occurrences as compared to other epistemic devices in the English corpus, the results of the frequency analysis of the Croatian corpus show that writers hedge their claims mostly by means of the modal verbs, epistemic verbs, and epistemic adverbs and particles, as attested by their overall similar frequencies. With respect to the distribution of epistemic devices across the research article sections, both English and Croatian writers hedge their claims mostly in the Discussion, followed by the Introduction section, while the use of epistemic devices in the remaining two sections is significantly lower by comparison. Generally, this complies with the major rhetorical functions of the research article sections. Thus, the highest density of hedges in the Discussion reflects its major rhetorical functions primarily concerned with writersā€™ interpretations and implications of the given research, which often requires a cautious and tentative use of language, shielding writers from the risks of negatibilty of the claims. By contrast, the use of hedges in the middle research article sections is less salient given their focus on the descriptive accounts of the methodological procedures and obtained findings. Drawing on Hylandā€™s (1998) polypragmatic model of scientific hedges, epistemic devices in both corpora are mostly concerned with the reliability type of hedges, concerned with indicating uncertainties towards the propositional content, signaling at the same time the extent to which the claims may be considered as accurate given the limited state of knowledge they are based on. In addition, epistemic markers may be used as writer-oriented hedges concerned with diminishing the writersā€™ presence in the text, allowing them to maintain distance from the proposed claims. Finally, the use of epistemic verbs co-occurring with the 1st person plural pronouns is interpreted in the present study as a writerā€™s strategic choice in foregrounding the epistemic stance. This use of epistemic devices is more frequent in the English as compared to the Croatian corpus, which is in line with some previous crosscultural research, indicating that self-mention is a more prominent feature of the Anglo- American writing as compared to that in other languages. In sum, the present findings provide an insight into the use of the epistemic language in the cross-cultural disciplinary writing and as such may be of particular use to the Croatian speaking disciplinary scholars, students and all those interested in writing research articles in English. On a more general note, it is expected that the study may incite further research on academic writing conventions in Croatian or their comparison with those in English as a lingua franca of science.Cilj je rada istražiti kako autori znanstvenih članaka iz područja psihologije na hrvatskom i engleskom jeziku koriste sredstva epistemičke modalnosti da bi izrazili različiti stupanj sigurnosti prema iznesenim tvrdnjama te iskazali stav prema tvrdnjama drugih autora. Analiza se temelji na korpusu 60 znanstvenih članaka objavljenim u znanstvenim časopisima na hrvatskom i engleskom jeziku. Cilj je analize utvrđivanje sličnosti i razlika u uporabi i učestalosti sredstava epistemičke modalnosti u glavnim retoričkim segmentima znanstvenog članka te istraživanje njihovih pragmatičkih funkcija kao sredstava ograđivanja u znanstvenom tekstu. Rezultati frekvencijske analize pokazuju veću zastupljenost sredstava epistemičke modalnosti u engleskom korpusu u odnosu na hrvatski, Å”to je općenito u skladu s nalazima prethodnih međujezičnih istraživanja koja upućuju na učestaliju uporabu oznaka ograđivanja u akademskom stilu angloameričkog govornog područja u odnosu na akademske stilove pisanja u nekim drugim jezicima. Rezultati pokazuju da su modalni glagoli najčeŔća gramatička kategorija epistemičkih sredstava u oba korpusa, dok su epistemički glagoli sljedeća kategorija po čestotnosti. U oba korpusa najmanju zastupljenost pokazuje uporaba epistemičkih imenica. Unatoč navedenim sličnostima, rezultati analize pokazuju na istaknutu uporabu modalnih glagola u engleskom korpusu, dok učestalost ostalih sredstava epistemičke modalnosti ne pokazuje drastična odstupanja. Rezultati analize hrvatskog korpusa pokazuju da se najčeŔća sredstva grupiraju oko modalnih glagola, epistemičkih punoznačnih glagola te modalnih priloga i čestica, dok su ostala sredstva značajno manje zastupljena. Nalazi analize ukazuju da se u oba korpusa oznake ograđivanja najviÅ”e koriste u Raspravi, manje u Uvodu, dok je značajno manja učestalost zabilježena u Metodi i Rezultatima. Najveća zastupljenost oznaka ograđivanja u Raspravi ukazuje na autorovu potrebu iskazivanja opreza i odmaka u tumačenju nalaza istraživanja i pokuÅ”ajima izvođenja zaključaka, Å”to proizlazi iz svijesti o različitim ograničenjima istraživanja koja često ne dozvoljavaju iskazivanje visokog stupnja sigurnosti u iznoÅ”enju stavova. Manja zastupljenost oznaka ograđivanja u srediÅ”njim segmentima članka odražava njihovu primarnu usmjerenost na opise metodoloÅ”kih postupaka i rezultata, Å”to u pravilu ne zahtijeva izraženiju uporabu oznaka ograđivanja. U odnosu na Hylandov (1998) polipragmatički model ograđivanja u znanstvenom tekstu, rezultati pokazuju da se sredstva epistemičke modalnosti najčeŔće koriste za iskazivanje nižeg stupnja sigurnosti u odnosu na sadržaj tvrdnje, upućujući pritom da se iste mogu smatrati pouzdanim u okvirima postojećeg, često ograničenog, znanja na temelju kojeg se izvode. Osim na propozicijski sadržaj, pragmatičke funkcije epistemičkih sredstava mogu biti usmjerene i na autora, pri čemu se umanjuje njegova prisutnost u tekstu te omogućuje zadržavanje većeg odmaka od iznesenih tvrdnji. Naposlijetku, uporaba prvog lica i punoznačnih epistemičkih glagola u ovom se radu smatra autorovim izborom s ciljem isticanja osobnog epistemičkog stava. Rezultati pokazuju da je navedena uporaba epistemičkih sredstava učestalija u engleskom korpusu, Å”to je općenito u skladu s nekim prethodnim međujezičnim istraživanjima koja ukazuju da je prisutnost autora istaknutija konvencija angloameričkog akademskog stila pisanja u odnosu na iste u nekim drugim istraživanim jezicima. Zaključno, pretpostavlja se da bi uočene specifičnosti u uporabi sredstava epistemičke modalnosti u psihologijskim člancima u engleskom i hrvatskom jeziku mogle koristiti predmetnim stručnjacima, studentima i svima onima koji počinju pisati ili već imaju iskustvo pisanja znanstvenih članaka kako na hrvatskom, tako i na engleskom jeziku. Očekuje se da bi postojeće istraživanje moglo potaknuti daljnja istraživanja konvencija akademskog pisanja, kako hrvatskog jezika, tako i njihove usporedbe s engleskim jezikom kao globalnim jezikom znanosti

    ANNOTATING A CORPUS OF BIOMEDICAL RESEARCH TEXTS: TWO MODELS OF RHETORICAL ANALYSIS

    Get PDF
    Recent advances in the biomedical sciences have led to an enormous increase in the amount of research literature being published, most of it in electronic form; researchers are finding it difficult to keep up-to-date on all of the new developments in their fields. As a result there is a need to develop automated Text Mining tools to filter and organize data in a way which is useful to researchers. Human-annotated data are often used as the ā€˜gold standardā€™ to train such systems via machine learning methods. This thesis reports on a project where three annotators applied two Models of rhetoric (argument) to a corpus of on-line biomedical research texts. How authors structure their argumentation and which rhetorical strategies they employ are key to how researchers present their experimental results; thus rhetorical analysis of a text could allow for the extraction of information which is pertinent for a particular researcherā€™s purpose. The first Model stems from previous work in Computational Linguistics; it focuses on differentiating ā€˜newā€™ from ā€˜oldā€™ information, and results from analysis of results. The second Model is based on Toulminā€™s argument structure (1958/2003); its main focus is to identify ā€˜Claimsā€™ being made by the authors, but it also differentiates between internal and external evidence, as well as categories of explanation and implications of the current experiment. In order to properly train automated systems, and as a gauge of the shared understanding of the argument scheme being applied, inter-annotator agreement should be relatively high. The results of this study show complete (three-way) inter-annotator agreement on an average of 60.5% of the 400 sentences in the final corpus under Model 1, and 39.3% under Model 2. Analyses of the inter-annotator variation are done in order to examine in detail all of the factors involved; these include particular Model categories, individual annotator preferences, errors, and the corpus data itself. In order to reduce this interĀ­ annotator variation, revisions to both Models are suggested; also it is recommended that in the future biomedical domain experts, possibly in tandem with experts in rhetoric, be used as annotators

    ANNOTATING A CORPUS OF BIOMEDICAL RESEARCH TEXTS: TWO MODELS OF RHETORICAL ANALYSIS

    Get PDF
    Recent advances in the biomedical sciences have led to an enormous increase in the amount of research literature being published, most of it in electronic form; researchers are finding it difficult to keep up-to-date on all of the new developments in their fields. As a result there is a need to develop automated Text Mining tools to filter and organize data in a way which is useful to researchers. Human-annotated data are often used as the ā€˜gold standardā€™ to train such systems via machine learning methods. This thesis reports on a project where three annotators applied two Models of rhetoric (argument) to a corpus of on-line biomedical research texts. How authors structure their argumentation and which rhetorical strategies they employ are key to how researchers present their experimental results; thus rhetorical analysis of a text could allow for the extraction of information which is pertinent for a particular researcherā€™s purpose. The first Model stems from previous work in Computational Linguistics; it focuses on differentiating ā€˜newā€™ from ā€˜oldā€™ information, and results from analysis of results. The second Model is based on Toulminā€™s argument structure (1958/2003); its main focus is to identify ā€˜Claimsā€™ being made by the authors, but it also differentiates between internal and external evidence, as well as categories of explanation and implications of the current experiment. In order to properly train automated systems, and as a gauge of the shared understanding of the argument scheme being applied, inter-annotator agreement should be relatively high. The results of this study show complete (three-way) inter-annotator agreement on m an average of 60.5% of the 400 sentences in the final corpus under Model 1, and 39.3% under Model 2. Analyses of the inter-annotator variation are done in order to examine in detail all of the factors involved; these include particular Model categories, individual annotator preferences, errors, and the corpus data itself. In order to reduce this interĀ­ annotator variation, revisions to both Models are suggested; also it is recommended that in the future biomedical domain experts, possibly in tandem with experts in rhetoric, be used as annotators
    • ā€¦
    corecore