1,080 research outputs found
Recognizing speculative language in biomedical research articles: a linguistically motivated perspective
We explore a linguistically motivated approach to the problem of recognizing speculative language (āhedgingā) in biomedical research articles. We describe a method, which draws on prior linguistic work as well as existing lexical resources and extends them by introducing syntactic patterns and a simple weighting scheme to estimate the speculation level of the sentences. We show that speculative language can be recognized successfully with such an approach, discuss some shortcomings of the method and point out future research possibilities.
An NLP Analysis of Health Advice Giving in the Medical Research Literature
Health advice ā clinical and policy recommendations ā plays a vital role in guiding medical practices and public health policies. Whether or not authors should give health advice in medical research publications is a controversial issue. The proponents of actionable research advocate for the more efficient and effective transmission of science evidence into practice. The opponents are concerned about the quality of health advice in individual research papers, especially that in observational studies. Arguments both for and against giving advice in individual studies indicate a strong need for identifying and accessing health advice, for either practical use or quality evaluation purposes. However, current information services do not support the direct retrieval of health advice. Compared to other natural language processing (NLP) applications, health advice has not been computationally modeled as a language construct either. A new information service for directly accessing health advice should be able to reduce information barriers and to provide external assessment in science communication.
This dissertation work built an annotated corpus of scientific claims that distinguishes health advice according to its occurrence and strength. The study developed NLP-based prediction models to identify health advice in the PubMed literature. Using the annotated corpus and prediction models, the study answered research questions regarding the practice of advice giving in medical research literature. To test and demonstrate the potential use of the prediction model, it was used to retrieve health advice regarding the use of hydroxychloroquine (HCQ) as a treatment for COVID-19 from LitCovid, a large COVID-19 research literature database curated by the National Institutes of Health.
An evaluation of sentences extracted from both abstracts and discussions showed that BERT-based pre-trained language models performed well at detecting health advice. The health advice prediction model may be combined with existing health information service systems to provide more convenient navigation of a large volume of health literature. Findings from the study also show researchers are careful not to give advice solely in abstracts. They also tend to give weaker and non-specific advice in abstracts than in discussions. In addition, the study found that health advice has appeared consistently in the abstracts of observational studies over the past 25 years. In the sample, 41.2% of the studies offered health advice in their conclusions, which is lower than earlier estimations based on analyses of much smaller samples processed manually. In the abstracts of observational studies, journals with a lower impact are more likely to give health advice than those with a higher impact, suggesting the significance of the role of journals as gatekeepers of science communication.
For the communities of natural language processing, information science, and public health, this work advances knowledge of the automated recognition of health advice in scientific literature. The corpus and code developed for the study have been made publicly available to facilitate future efforts in health advice retrieval and analysis. Furthermore, this study discusses the ways in which researchers give health advice in medical research articles, knowledge of which could be an essential step towards curbing potential exaggeration in the current global science communication. It also contributes to ongoing discussions of the integrity of scientific output.
This study calls for caution in advice-giving in medical research literature, especially in abstracts alone. It also calls for open access to medical research publications, so that health researchers and practitioners can fully review the advice in scientific outputs and its implications. More evaluative strategies that can increase the overall quality of health advice in research articles are needed by journal editors and reviewers, given their gatekeeping role in science communication
Measuring academic influence: Not all citations are equal
The importance of a research article is routinely measured by counting how
many times it has been cited. However, treating all citations with equal weight
ignores the wide variety of functions that citations perform. We want to
automatically identify the subset of references in a bibliography that have a
central academic influence on the citing paper. For this purpose, we examine
the effectiveness of a variety of features for determining the academic
influence of a citation. By asking authors to identify the key references in
their own work, we created a data set in which citations were labeled according
to their academic influence. Using automatic feature selection with supervised
machine learning, we found a model for predicting academic influence that
achieves good performance on this data set using only four features. The best
features, among those we evaluated, were those based on the number of times a
reference is mentioned in the body of a citing paper. The performance of these
features inspired us to design an influence-primed h-index (the hip-index).
Unlike the conventional h-index, it weights citations by how many times a
reference is mentioned. According to our experiments, the hip-index is a better
indicator of researcher performance than the conventional h-index
Metadiscourse analysis of digital interpersonal interactions in academic settings in Turkey
Rapid technological advances, efficiency and easy access have firmly established emailing as a vital medium of communication in the last decades. Nowadays, all around the world, particularly in educational settings, the medium is one of the most widely used modes of interaction between students and university lecturers. Despite their important role in academic life, very little is known about the metadiscursive characteristics of these e-messages and as far as the author is aware there is no study that has examined metadiscourse in request emails in Turkish. This study aims to contribute to filling in this gap by focusing on the following two research questions: (i) How many and what type of interpersonal metadiscourse markers are used in request emails sent by students to their lecturers? (ii) Where are they placed and how are they combined with other elements in the text? In order to answer these questions a corpus of unsolicited request e-mails in Turkish was compiled. The data collection started in January 2010 and continued until March 2018. A total of 353 request emails sent from university students to their lecturers were collected. The data were first transcribed in CLAN CHILDES format and analysed using the interpersonal model. The metadiscourse categories that aimed to involve readers in the email were identified and classified. Next, their places in the text were determined and described in detail. Findings of the study show that request emails include a wide array of multifunctional interpersonal metadiscourse markers which are intricately combined and employed by the writers to reach their aims. The results also showed that there is a close relation between the āweight of the requestā and number of the interpersonal metadiscourse markers in request mails
Studies on Metadiscourse since the 3rd Millennium
Metadiscourse refer to linguistic resources that are used to refer to the act and the context of writing about some subject matter. Study of metadiscourse provides a gateway for understanding interactional features of texts or speech, looking beyond the ideational dimension of texts at how writers characterize the world and function interpersonally. The ability of writers to use metadiscourse effectively, to control the level of personality in their texts by offering a credible representation of themselves and their ideas, is seen as a defining feature of successful writing (Hyland, 2008). This paper provides a literature review of the theories in modelling metadiscourse and the studies investigating metadiscourse for the past 15 years, and propose future research directions based on the review. Keywords: Metadiscourse, interactional features, literature revie
Scientific Knowledge Communication in Online Q&A Communities: Linguistic Devices as a Tool to Increase the Popularity and Perceived Professionalism of Knowledge Contribution
With the popularity of question-and-answer (Q&A) communities, widespread dissemination of scientific knowledge has become more viable than ever before. However, those contributing high-quality professional scientific knowledge are confronted with the challenge of making their contributions popular, since non expert readers may not recognize the importance of their contributions given the massive amount of information available online. In this study, we show that non expert readers are capable of evaluating the professionalism of content contributed in such communities as well as experts. However, we discovered that a salient discrepancy exists between the content non experts favor and the content they perceive as professional. In line with studies that have suggested that writing techniques play an important role in how expert content is received by lay persons, we investigated how the use of linguistic devices affects both the perceived professionalism and the popularity of contributions in Q&A communities. Based on both secondary data and a scenario-based survey, we identified specific linguistic devices that can increase content popularity without reducing perceived professionalism. Additionally, we revealed linguistic devices that increase popularity at the expense of perceived professionalism in this context. Finally, we conducted a laboratory experiment to more firmly establish the causal effects of the linguistic device use. The triangulated findings have important implications for both research and practice on communicating scientific knowledge in Q&A communitie
EpistemiÄka modalnost u akademskom diskursu u hrvatskom i engleskom jeziku
The present thesis is the result of a cross-cultural, genre-based study whose main objective is
to examine how writers of research articles in psychology in Croatian and English use
epistemic modality devices in hedging their claims or in evaluating other scholarsā work.
Based on the corpus of 60 research articles published in Croatian and English journals, the
study aims to establish the patterns of similarities and differences in the use of the epistemic
devices across the main rhetorical sections of a research article as well as to identify their
major hedging functions.
The overall results show that English writers use epistemic markers more frequently than their
Croatian counterparts. This finding is generally in line with the previous cross-cultural
studies, showing a more salient use of hedges and their more entrenched status in the Anglo-
American writing as compared to academic writing in some other languages investigated.
With respect to the individual categories of epistemic devices, the results show both
similarities and differences in their uses across the two sub-corpora. In both the English and
Croatian sub-corpus, epistemic modal verbs are employed most frequently, followed by
epistemic verbs, while epistemic nouns are the least frequent category of epistemic devices.
The major difference in the overall results concerns the distributional patterns in the use of
epistemic devices. While epistemic modal verbs show a strikingly high frequency of
occurrences as compared to other epistemic devices in the English corpus, the results of the
frequency analysis of the Croatian corpus show that writers hedge their claims mostly by
means of the modal verbs, epistemic verbs, and epistemic adverbs and particles, as attested by
their overall similar frequencies.
With respect to the distribution of epistemic devices across the research article sections, both
English and Croatian writers hedge their claims mostly in the Discussion, followed by the
Introduction section, while the use of epistemic devices in the remaining two sections is
significantly lower by comparison. Generally, this complies with the major rhetorical
functions of the research article sections. Thus, the highest density of hedges in the
Discussion reflects its major rhetorical functions primarily concerned with writersā
interpretations and implications of the given research, which often requires a cautious and
tentative use of language, shielding writers from the risks of negatibilty of the claims. By
contrast, the use of hedges in the middle research article sections is less salient given their
focus on the descriptive accounts of the methodological procedures and obtained findings.
Drawing on Hylandās (1998) polypragmatic model of scientific hedges, epistemic devices in
both corpora are mostly concerned with the reliability type of hedges, concerned with
indicating uncertainties towards the propositional content, signaling at the same time the
extent to which the claims may be considered as accurate given the limited state of knowledge
they are based on. In addition, epistemic markers may be used as writer-oriented hedges
concerned with diminishing the writersā presence in the text, allowing them to maintain
distance from the proposed claims. Finally, the use of epistemic verbs co-occurring with the
1st person plural pronouns is interpreted in the present study as a writerās strategic choice in
foregrounding the epistemic stance. This use of epistemic devices is more frequent in the
English as compared to the Croatian corpus, which is in line with some previous crosscultural
research, indicating that self-mention is a more prominent feature of the Anglo-
American writing as compared to that in other languages.
In sum, the present findings provide an insight into the use of the epistemic language in the
cross-cultural disciplinary writing and as such may be of particular use to the Croatian
speaking disciplinary scholars, students and all those interested in writing research articles in
English. On a more general note, it is expected that the study may incite further research on
academic writing conventions in Croatian or their comparison with those in English as a
lingua franca of science.Cilj je rada istražiti kako autori znanstvenih Älanaka iz podruÄja psihologije na hrvatskom i
engleskom jeziku koriste sredstva epistemiÄke modalnosti da bi izrazili razliÄiti stupanj
sigurnosti prema iznesenim tvrdnjama te iskazali stav prema tvrdnjama drugih autora. Analiza
se temelji na korpusu 60 znanstvenih Älanaka objavljenim u znanstvenim Äasopisima na
hrvatskom i engleskom jeziku. Cilj je analize utvrÄivanje sliÄnosti i razlika u uporabi i
uÄestalosti sredstava epistemiÄke modalnosti u glavnim retoriÄkim segmentima znanstvenog
Älanka te istraživanje njihovih pragmatiÄkih funkcija kao sredstava ograÄivanja u
znanstvenom tekstu.
Rezultati frekvencijske analize pokazuju veÄu zastupljenost sredstava epistemiÄke modalnosti
u engleskom korpusu u odnosu na hrvatski, Å”to je opÄenito u skladu s nalazima prethodnih
meÄujeziÄnih istraživanja koja upuÄuju na uÄestaliju uporabu oznaka ograÄivanja u
akademskom stilu angloameriÄkog govornog podruÄja u odnosu na akademske stilove pisanja
u nekim drugim jezicima.
Rezultati pokazuju da su modalni glagoli najÄeÅ”Äa gramatiÄka kategorija epistemiÄkih
sredstava u oba korpusa, dok su epistemiÄki glagoli sljedeÄa kategorija po Äestotnosti. U oba
korpusa najmanju zastupljenost pokazuje uporaba epistemiÄkih imenica. UnatoÄ navedenim
sliÄnostima, rezultati analize pokazuju na istaknutu uporabu modalnih glagola u engleskom
korpusu, dok uÄestalost ostalih sredstava epistemiÄke modalnosti ne pokazuje drastiÄna
odstupanja. Rezultati analize hrvatskog korpusa pokazuju da se najÄeÅ”Äa sredstva grupiraju
oko modalnih glagola, epistemiÄkih punoznaÄnih glagola te modalnih priloga i Äestica, dok su
ostala sredstva znaÄajno manje zastupljena.
Nalazi analize ukazuju da se u oba korpusa oznake ograÄivanja najviÅ”e koriste u Raspravi,
manje u Uvodu, dok je znaÄajno manja uÄestalost zabilježena u Metodi i Rezultatima.
NajveÄa zastupljenost oznaka ograÄivanja u Raspravi ukazuje na autorovu potrebu iskazivanja
opreza i odmaka u tumaÄenju nalaza istraživanja i pokuÅ”ajima izvoÄenja zakljuÄaka, Å”to
proizlazi iz svijesti o razliÄitim ograniÄenjima istraživanja koja Äesto ne dozvoljavaju
iskazivanje visokog stupnja sigurnosti u iznoŔenju stavova. Manja zastupljenost oznaka
ograÄivanja u srediÅ”njim segmentima Älanka odražava njihovu primarnu usmjerenost na opise
metodoloŔkih postupaka i rezultata, Ŕto u pravilu ne zahtijeva izraženiju uporabu oznaka
ograÄivanja.
U odnosu na Hylandov (1998) polipragmatiÄki model ograÄivanja u znanstvenom tekstu,
rezultati pokazuju da se sredstva epistemiÄke modalnosti najÄeÅ”Äe koriste za iskazivanje nižeg
stupnja sigurnosti u odnosu na sadržaj tvrdnje, upuÄujuÄi pritom da se iste mogu smatrati
pouzdanim u okvirima postojeÄeg, Äesto ograniÄenog, znanja na temelju kojeg se izvode.
Osim na propozicijski sadržaj, pragmatiÄke funkcije epistemiÄkih sredstava mogu biti
usmjerene i na autora, pri Äemu se umanjuje njegova prisutnost u tekstu te omoguÄuje
zadržavanje veÄeg odmaka od iznesenih tvrdnji. Naposlijetku, uporaba prvog lica i
punoznaÄnih epistemiÄkih glagola u ovom se radu smatra autorovim izborom s ciljem
isticanja osobnog epistemiÄkog stava. Rezultati pokazuju da je navedena uporaba
epistemiÄkih sredstava uÄestalija u engleskom korpusu, Å”to je opÄenito u skladu s nekim
prethodnim meÄujeziÄnim istraživanjima koja ukazuju da je prisutnost autora istaknutija
konvencija angloameriÄkog akademskog stila pisanja u odnosu na iste u nekim drugim
istraživanim jezicima.
ZakljuÄno, pretpostavlja se da bi uoÄene specifiÄnosti u uporabi sredstava epistemiÄke
modalnosti u psihologijskim Älancima u engleskom i hrvatskom jeziku mogle koristiti predmetnim struÄnjacima, studentima i svima onima koji poÄinju pisati ili veÄ imaju iskustvo
pisanja znanstvenih Älanaka kako na hrvatskom, tako i na engleskom jeziku. OÄekuje se da bi
postojeÄe istraživanje moglo potaknuti daljnja istraživanja konvencija akademskog pisanja,
kako hrvatskog jezika, tako i njihove usporedbe s engleskim jezikom kao globalnim jezikom
znanosti
ANNOTATING A CORPUS OF BIOMEDICAL RESEARCH TEXTS: TWO MODELS OF RHETORICAL ANALYSIS
Recent advances in the biomedical sciences have led to an enormous increase in the amount of research literature being published, most of it in electronic form; researchers are finding it difficult to keep up-to-date on all of the new developments in their fields. As a result there is a need to develop automated Text Mining tools to filter and organize data in a way which is useful to researchers. Human-annotated data are often used as the āgold standardā to train such systems via machine learning methods.
This thesis reports on a project where three annotators applied two Models of rhetoric (argument) to a corpus of on-line biomedical research texts. How authors structure their argumentation and which rhetorical strategies they employ are key to how researchers present their experimental results; thus rhetorical analysis of a text could allow for the extraction of information which is pertinent for a particular researcherās purpose. The first Model stems from previous work in Computational Linguistics; it focuses on differentiating ānewā from āoldā information, and results from analysis of results. The second Model is based on Toulminās argument structure (1958/2003); its main focus is to identify āClaimsā being made by the authors, but it also differentiates between internal and external evidence, as well as categories of explanation and implications of the current experiment.
In order to properly train automated systems, and as a gauge of the shared understanding of the argument scheme being applied, inter-annotator agreement should be relatively high. The results of this study show complete (three-way) inter-annotator agreement on
an average of 60.5% of the 400 sentences in the final corpus under Model 1, and 39.3% under Model 2. Analyses of the inter-annotator variation are done in order to examine in detail all of the factors involved; these include particular Model categories, individual annotator preferences, errors, and the corpus data itself. In order to reduce this interĀ annotator variation, revisions to both Models are suggested; also it is recommended that in the future biomedical domain experts, possibly in tandem with experts in rhetoric, be used as annotators
ANNOTATING A CORPUS OF BIOMEDICAL RESEARCH TEXTS: TWO MODELS OF RHETORICAL ANALYSIS
Recent advances in the biomedical sciences have led to an enormous increase in the amount of research literature being published, most of it in electronic form; researchers are finding it difficult to keep up-to-date on all of the new developments in their fields. As a result there is a need to develop automated Text Mining tools to filter and organize data in a way which is useful to researchers. Human-annotated data are often used as the āgold standardā to train such systems via machine learning methods.
This thesis reports on a project where three annotators applied two Models of rhetoric (argument) to a corpus of on-line biomedical research texts. How authors structure their argumentation and which rhetorical strategies they employ are key to how researchers present their experimental results; thus rhetorical analysis of a text could allow for the extraction of information which is pertinent for a particular researcherās purpose. The first Model stems from previous work in Computational Linguistics; it focuses on differentiating ānewā from āoldā information, and results from analysis of results. The
second Model is based on Toulminās argument structure (1958/2003); its main focus is to identify āClaimsā being made by the authors, but it also differentiates between internal and external evidence, as well as categories of explanation and implications of the current experiment.
In order to properly train automated systems, and as a gauge of the shared understanding of the argument scheme being applied, inter-annotator agreement should be relatively high. The results of this study show complete (three-way) inter-annotator agreement on
m
an average of 60.5% of the 400 sentences in the final corpus under Model 1, and 39.3% under Model 2. Analyses of the inter-annotator variation are done in order to examine in detail all of the factors involved; these include particular Model categories, individual annotator preferences, errors, and the corpus data itself. In order to reduce this interĀ annotator variation, revisions to both Models are suggested; also it is recommended that in the future biomedical domain experts, possibly in tandem with experts in rhetoric, be used as annotators
- ā¦