53 research outputs found
A methodology for ontology reuse : the case of the abdominal ultrasound ontology
There are abundance of existing biomedical ontologies such as the National Cancer Institute
Thesaurus (NCIT), the Foundational Model of Anatomy (FMA) and the Systematized
Nomenclature of Medicine-Clinical Terms (SNOMED CT). Implementing these ontologies in
a particular system however, may cause unnecessary high usage of memory and slows down
the systemsâ performance. On the other hand, building a new ontology from scratch will
require additional time and efforts. Therefore, this research explores the ontology reuse
approach in order to develop an Abdominal Ultrasound Ontology (AUO) by extracting
concepts from existing biomedical ontologies. This paper presents the reader with a step by
step method in reusing ontologies together with suggestions of the off-the-shelf tools that can
be used to ease the process. The result shows that ontology reuse is beneficial especially in
the biomedical field as it allows for developers from the non-technical background to build
and use domain specific ontology with ease. It also allows for developers with technical
background to develop ontologies with minimal involvements from domain experts. The
methodology proposed is also adaptable as it allows the ontology to be updated fairly easily
Are Deep Learning Approaches Suitable for Natural Language Processing?
In recent years, Deep Learning (DL) techniques have gained much at-tention from Artificial Intelligence (AI) and Natural Language Processing (NLP) research communities because these approaches can often learn features from data without the need for human design or engineering interventions. In addition, DL approaches have achieved some remarkable results. In this paper, we have surveyed major recent contributions that use DL techniques for NLP tasks. All these reviewed topics have been limited to show contributions to text understand-ing, such as sentence modelling, sentiment classification, semantic role labelling, question answering, etc. We provide an overview of deep learning architectures based on Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Recursive Neural Networks (RNNs)
Arabic Quranic Search Tool Based on Ontology
This paper reviews and classifies most of the common types of search techniques that have been applied on the Holy Quran. Then, it addresses the limitations of these methods. Additionally, this paper surveys most existing Quranic ontologies and what are their deficiencies. Finally, it explains a new search tool called: a semantic search tool for Al-Quran based on Qurâanic on-tologies. This tool will overcome all limitations in the existing Quranic search applications
Query expansion using medical information extraction for improving information retrieval in French medical domain
Many usersâ queries contain references to named entities, and this is particularly true in the medical field. Doctors express their information needs using medical entities as they are elements rich with information that helps to better target the relevant documents. At the same time, many resources have been recognized as a large container of medical entities and relationships between them such as clinical reports; which are medical texts written by doctors. In this paper, we present a query expansion method that uses medical entities and their semantic relations in the query context based on an external resource in OWL. The goal of this method is to evaluate the effectiveness of an information retrieval system to support doctors in accessing easily relevant information. Experiments on a collection of real clinical reports show that our approach reveals interesting improvements in precision, recall and MAP in medical information retrieval
Value of expressions behind the letter capitalization in product reviews
Product reviews from consumers are the source of opinions and expressions about purchased items or services. Thus, it is essential to understand the true meaning behind text reviews. One of the ways is to analyze sentiments, expressions and emotions behind the text. However, there are different styles of writing used in the text. One of widely used in the text is letter capitalization. It is commonly used to strengthen an expression or louder tone within the text. This paper explores the value of expression behind letter capitalization in product reviews. We compared fully capitalized text, text with one capitalized words and text without capitalization
through the readersâ perspective by asking them to rate the text based on Likert scale. Furthermore, we tested two samples of text with and without capitalization on 27 available online sentiment tools. Testing was done in order to check how current sentiment tools treat letter capitalization in their sentiment score. Results show that of letter capitalization is able to enforce the different level of expression. If the nature of the review is positive, the capitalization makes it more positive. Similar for the negative reviews, the capitalization tends to increase negativity
AuthentiGPT: Detecting Machine-Generated Text via Black-Box Language Models Denoising
Large language models (LLMs) have opened up enormous opportunities while
simultaneously posing ethical dilemmas. One of the major concerns is their
ability to create text that closely mimics human writing, which can lead to
potential misuse, such as academic misconduct, disinformation, and fraud. To
address this problem, we present AuthentiGPT, an efficient classifier that
distinguishes between machine-generated and human-written texts. Under the
assumption that human-written text resides outside the distribution of
machine-generated text, AuthentiGPT leverages a black-box LLM to denoise input
text with artificially added noise, and then semantically compares the denoised
text with the original to determine if the content is machine-generated. With
only one trainable parameter, AuthentiGPT eliminates the need for a large
training dataset, watermarking the LLM's output, or computing the
log-likelihood. Importantly, the detection capability of AuthentiGPT can be
easily adapted to any generative language model. With a 0.918 AUROC score on a
domain-specific dataset, AuthentiGPT demonstrates its effectiveness over other
commercial algorithms, highlighting its potential for detecting
machine-generated text in academic settings
Consistency of online consumers' perceptions of posted comments: An analysis of TripAdvisor reviews
Ratings and comments play a dominant role in online reviews.
The question, thus, arises as to whether or not there is any
consistency in consumer perception of the reviews, and how
future choices might be influenced. We analysed 2000 comments of 20 different hotels posted on TripAdvisor to determine if the comments posted by previous guests of a hotel influence the decisions of potential guests. Two hundred human raters were asked to consider 20 reviews and to rate a hotel based on the reviews. The Cohen Kappa coefficient was used to evaluate the degree of agreement on the hotel quality as determined by the human raters and the star rating given by the original reviewer. The results showed a high consistency between the human ratersâ
evaluation and the reviewersâ star rating. This research reveals the importance of website feedback such as TripAdvisor in influencing consumer choice
A Review of Research-Based Automatic Text Simplification Tools
In the age of knowledge, the democratisation of information facilitated through the Internet may not be as pervasive if written language poses challenges to particular sectors of the population. The objective of this paper is to present an overview of research-based automatic text simplification tools. Consequently, we describe aspects such as the language, language phenomena, language levels simplified, approaches, specific target populations these tools are created for (e.g. individuals with cognitive impairment, attention deficit, elderly people, children, language learners), and accessibility and availability considerations. The review of existing studies covering automatic text simplification tools is undergone by searching two databases: Web of Science and Scopus. The eligibility criteria involve text simplification tools with a scientific background in order to ascertain how they operate. This methodology yielded 27 text simplification tools that are further analysed. Some of the main conclusions reached with this review are the lack of resources accessible to the public, the need for customisation to foster the individualâs independence by allowing the user to select what s/he finds challenging to understand while not limiting the userâs capabilities and the need for more simplification tools in languages other than English, to mention a few.This research was conducted as part of the Clear-Text project (TED2021-130707B-I00), funded by MCIN/AEI/10.13039/501100011033 and European Union NextGenerationEU/PRTR
Large Language Models can be Guided to Evade AI-Generated Text Detection
Large Language Models (LLMs) have demonstrated exceptional performance in a
variety of tasks, including essay writing and question answering. However, it
is crucial to address the potential misuse of these models, which can lead to
detrimental outcomes such as plagiarism and spamming. Recently, several
detectors have been proposed, including fine-tuned classifiers and various
statistical methods. In this study, we reveal that with the aid of carefully
crafted prompts, LLMs can effectively evade these detection systems. We propose
a novel Substitution-based In-Context example Optimization method (SICO) to
automatically generate such prompts. On three real-world tasks where LLMs can
be misused, SICO successfully enables ChatGPT to evade six existing detectors,
causing a significant 0.54 AUC drop on average. Surprisingly, in most cases
these detectors perform even worse than random classifiers. These results
firmly reveal the vulnerability of existing detectors. Finally, the strong
performance of SICO suggests itself as a reliable evaluation protocol for any
new detector in this field
- âŠ