122 research outputs found
Doctor of Philosophy
dissertationThe use of the various complementary and alternative medicine (CAM) modalities for the management of chronic illnesses is widespread, and still on the rise. Unfortunately, tools to support consumers in seeking information on the efficacy of these treatments are sparse and incomplete. The goals of this work were to understand CAM information needs in acquiring CAM information, assess currently available information resources, and investigate informatics methods to provide a foundation for the development of CAM information resources. This dissertation consists of four studies. The first was a quantitative study that aimed to assess the feasibility of delivering CAM-drug interaction information through a web-based application. This study resulted in an 85% participation rate and 33% of those patients reported the use of CAMs that had potential interactions with their conventional treatments. The next study aimed to assess online CAM information resources that provide information on drug-herb interactions to consumers. None of the sites scored high on the combination of completeness and accuracy and all sites were beyond the recommended reading level per the US Department of Health and Human Services. The third study investigated information-seeking behaviors for CAM information using an existing cohort of cancer survivors. The study showed that patients in the cohort continued to use CAM well into survivorship. Patients felt very much on their own in dealing with issues outside of direct treatment, which often resulted in a search for options and CAM use. Finally, a study was conducted to investigate two methods to semi-automatically extract CAM treatment relations from the biomedical literature. The methods rely on a database (SemMedDB) of semantic relations extracted from PubMed abstracts. This study demonstrated that SemMedDB can be used to reduce manual efforts, but review of the extracted sentences is still necessary due to a low mean precision of 23.7% and 26.4%. In summary, this dissertation provided greater insight into consumer information needs for CAM. Our findings provide an opportunity to leverage existing resources to improve the information-seeking experience for consumers through high-quality online tools, potentially moving them beyond the reliance on anecdotal evidence in the decision-making process for CAM
Detecting Misinformation with LLM-Predicted Credibility Signals and Weak Supervision
Credibility signals represent a wide range of heuristics that are typically
used by journalists and fact-checkers to assess the veracity of online content.
Automating the task of credibility signal extraction, however, is very
challenging as it requires high-accuracy signal-specific extractors to be
trained, while there are currently no sufficiently large datasets annotated
with all credibility signals. This paper investigates whether large language
models (LLMs) can be prompted effectively with a set of 18 credibility signals
to produce weak labels for each signal. We then aggregate these potentially
noisy labels using weak supervision in order to predict content veracity. We
demonstrate that our approach, which combines zero-shot LLM credibility signal
labeling and weak supervision, outperforms state-of-the-art classifiers on two
misinformation datasets without using any ground-truth labels for training. We
also analyse the contribution of the individual credibility signals towards
predicting content veracity, which provides new valuable insights into their
role in misinformation detection
Mediterranean developed coasts: what future for the foredune restoration?
The feasibility and efficacy of soft engineering foredune restoration approaches still lack insight from research and monitoring activities, especially in areas where dunes are under persisting human disturbance. We evaluated the efficacy of Mediterranean foredune restoration in dune areas freely accessible to tourists. Foredunes were reconstructed using only sand already available at nearby places and consolidated through the plantation of seedlings of native ecosystem engineer species and foredune focal species. We monitored transplanted and spontaneous seedlings for one year to assess their mortality and growth in relation to the distance from the closest beach access, either formal or informal, as proxy of human disturbance.We also tested whether species differing in their ecology (i.e., affinity to a given habitat) and growth form showed different response to human disturbance. The relationship between seedling mortality and growth and the distance from the closest beach access was tested through Generalized Linear Mixed Models. We found a clear spatial pattern of seedling survival and growth, which decreased as the proximity to the closest beach access increased. Only invasive alien plants and erect leafy species showed to better perform at lower distances from beach accesses. In dune areas with a strong tourist vocation, foredune restoration should be coupled with the implementation of integrated management plans aiming at optimising the relationship between protection and use. Management plans should not only rely on passive conservation measures; rather they should include educational activities to stimulate a pro-environmental behaviour, increase the acceptance of behaviour rules and no entry zones, and actively engage stakeholders in long-term conservation
Testing the performance of an innovative markerless technique for quantitative and qualitative gait analysis
Gait abnormalities such as high stride and step frequency/cadence (SF-stride/second, CAD-step/second), stride variability (SV) and low harmony may increase the risk of injuries and be a sentinel of medical conditions. This research aims to present a new markerless video-based technology for quantitative and qualitative gait analysis. 86 healthy individuals (mead age 32 years) performed a 90 s test on treadmill at self-selected walking speed. We measured SF and CAD by a photoelectric sensors system; then, we calculated average \ub1 standard deviation (SD) and within-subject coefficient of variation (CV) of SF as an index of SV. We also recorded a 60 fps video of the patient. With a custom-designed web-based video analysis software, we performed a spectral analysis of the brightness over time for each pixel of the image, that reinstituted the frequency contents of the videos. The two main frequency contents (F1 and F2) from this analysis should reflect the forcing/dominant variables, i.e., SF and CAD. Then, a harmony index (HI) was calculated, that should reflect the proportion of the pixels of the image that move consistently with F1 or its supraharmonics. The higher the HI value, the less variable the gait. The correspondence SF-F1 and CAD-F2 was evaluated with both paired t-Test and correlation and the relationship between SV and HI with correlation. SF and CAD were not significantly different from and highly correlated with F1 (0.893 \ub1 0.080 Hz vs. 0.895 \ub1 0.084 Hz, p < 0.001, r2 = 0.99) and F2 (1.787 \ub1 0.163 Hz vs. 1.791 \ub1 0.165 Hz, p < 0.001, r2 = 0.97). The SV was 1.84% \ub1 0.66% and it was significantly and moderately correlated with HI (0.082 \ub1 0.028, p < 0.001, r2 = 0.13). The innovative video-based technique of global, markerless gait analysis proposed in our study accurately identifies the main frequency contents and the variability of gait in healthy individuals, thus providing a time-efficient, low-cost means to quantitatively and qualitatively study human locomotion
ASSET : a dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations
In order to simplify a sentence, human editors perform multiple rewriting transformations: they split it into several shorter sentences, paraphrase words (i.e. replacing complex words or phrases by simpler synonyms), reorder components, and/or delete information deemed unnecessary. Despite these varied range of possible text alterations, current models for automatic sentence simplification are evaluated using datasets that are focused on a single transformation, such as lexical paraphrasing or splitting. This makes it impossible to understand the ability of simplification models in more realistic settings. To alleviate this limitation, this paper introduces ASSET, a new dataset for assessing sentence simplification in English. ASSET is a crowdsourced multi-reference corpus where each simplification was produced by executing several rewriting transformations. Through quantitative and qualitative experiments, we show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task. Furthermore, we motivate the need for developing better methods for automatic evaluation using ASSET, since we show that current popular metrics may not be suitable when multiple simplification transformations are performed
Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification
Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning
techniques designed to make the training of language models more efficient.
Previous results demonstrated that these methods can even improve performance
on some classification tasks. This paper complements the existing research by
investigating how these techniques influence the classification performance and
computation costs compared to full fine-tuning when applied to multilingual
text classification tasks (genre, framing, and persuasion techniques detection;
with different input lengths, number of predicted classes and classification
difficulty), some of which have limited training data. In addition, we conduct
in-depth analyses of their efficacy across different training scenarios
(training on the original multilingual data; on the translations into English;
and on a subset of English-only data) and different languages. Our findings
provide valuable insights into the applicability of the parameter-efficient
fine-tuning techniques, particularly to complex multilingual and multilabel
classification tasks
Solution of the End Problem of a Liquid-Filled Cylindrical Acoustic Waveguide Using a Biorthogonality Principle
This paper treats the forced motion of an isothermal, Newtonian liquid in a sem
Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting
A popular application of machine translation (MT) is gisting: MT is consumed as is to make sense of text in a foreign language. Evaluation of the usefulness of MT for gisting is surprisingly uncommon. The classical method uses reading comprehension questionnaires (RCQ), in which informants are asked to answer professionally-written questions in their language about a foreign text that has been machine-translated into their language. Recently, gap-filling (GF), a form of cloze testing, has been proposed as a cheaper alternative to RCQ. In GF, certain words are removed from reference translations and readers are asked to fill the gaps left using the machine-translated text as a hint. This paper reports, for thefirst time, a comparative evaluation, using both RCQ and GF, of translations from multiple MT systems for the same foreign texts, and a systematic study on the effect of variables such as gap density, gap-selection strategies, and document context in GF. The main findings of the study are: (a) both RCQ and GF clearly identify MT to be useful, (b) global RCQ and GF rankings for the MT systems are mostly in agreement, (c) GF scores vary very widely across informants, making comparisons among MT systems hard, and (d) unlike RCQ, which is framed around documents, GF evaluation can be framed at the sentence level. These findings support the use of GF as a cheaper alternative to RCQ
Probing for idiomaticity in vector space models
Contextualised word representation models have been successfully used for capturing different word usages and they may be an attractive alternative for representing idiomaticity in language. In this paper, we propose probing measures to assess if some of the expected linguistic properties of noun compounds, especially those related to idiomatic meanings, and their dependence on context and sensitivity to lexical choice, are readily available in some standard and widely used representations. For that, we constructed the Noun Compound Senses Dataset, which contains noun compounds and their paraphrases, in context neutral and context informative naturalistic sentences, in two languages: English and Portuguese. Results obtained using four types of probing measures with models like ELMo, BERT and some of its variants, indicate that idiomaticity is not yet accurately represented by contextualised models
Categorising fine-to-coarse grained misinformation : an empirical study of COVID-19 infodemic
The spreading COVID-19 misinformation over social media already draws the attention of many researchers. According to Google Scholar, about 26000 COVID-19 related misinformation studies have been published to date. Most of these studies focusing on 1) detect and/or 2) analysing the characteristics of COVID-19 related misinformation. However, the study of the social behaviours related to misinformation is often neglected. In this paper, we introduce a fine-grained annotated misinformation tweets dataset including social behaviours annotation (e.g. comment or question to the misinformation). The dataset not only allows social behaviours analysis but also suitable for both evidence-based or non-evidence-based misinformation classification task. In addition, we introduce leave claim out validation in our experiments and demonstrate the misinformation classification performance could be significantly different when applying to real-world unseen misinformation
- …