Search CORE

1,365 research outputs found

Towards the Automatic Classification of Documents in User-generated Classifications

Author: Morshed Ahsan-Ul
Publication venue
Publication date: 01/01/2006
Field of study

There is a huge amount of information scattered on the World Wide Web. As the information flow occurs at a high speed in the WWW, there is a need to organize it in the right manner so that a user can access it very easily. Previously the organization of information was generally done manually, by matching the document contents to some pre-defined categories. There are two approaches for this text-based categorization: manual and automatic. In the manual approach, a human expert performs the classification task, and in the second case supervised classifiers are used to automatically classify resources. In a supervised classification, manual interaction is required to create some training data before the automatic classification task takes place. In our new approach, we intend to propose automatic classification of documents through semantic keywords and building the formulas generation by these keywords. Thus we can reduce this human participation by combining the knowledge of a given classification and the knowledge extracted from the data. The main focus of this PhD thesis, supervised by Prof. Fausto Giunchiglia, is the automatic classification of documents into user-generated classifications. The key benefits foreseen from this automatic document classification is not only related to search engines, but also to many other fields like, document organization, text filtering, semantic index managing

Unitn-eprints Research

An Entailment Relation for Reasoning on the Web

Author: B. Heumesser
F. Bry
I. Niemelä
J. Lloyd
K.L. Clark
S. Abiteboul
V. Lifschitz
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Reasoning on the Web is receiving an increasing attention because of emerging fields such as Web adaption and Semantic Web. Indeed, the advanced functionalities striven for in these fields call for reasoning capabilities. Reasoning on the Web, however, is usually done using existing techniques rarely fitting the Web. As a consequence, additional data processing like data conversion from Web formats (e.g. XML or HTML) into some other formats (e.g. classical logic terms and formulas) is often needed and aspects of the Web (e.g. its inherent inconsistency) are neglected. This article first gives requirements for an entailment tuned to reasoning on the Web. Then, it describes how classical logic’s entailment can be modified so as to enforce these requirements. Finally, it discusses how the proposed entailment can be used in applying logic programming to reasoning on the Web

Crossref

Open Access LMU

Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained language models

Author: Lorge Isabelle
Pierrehumbert Janet
Publication venue
Publication date: 22/10/2023
Field of study

Vector space models of word meaning all share the assumption that words occurring in similar contexts have similar meanings. In such models, words that are similar in their topical associations but differ in their logical force tend to emerge as semantically close, creating well-known challenges for NLP applications that involve logical reasoning. Modern pretrained language models, such as BERT, RoBERTa and GPT-3 hold the promise of performing better on logical tasks than classic static word embeddings. However, reports are mixed about their success. In the current paper, we advance this discussion through a systematic study of scalar adverbs, an under-explored class of words with strong logical force. Using three different tasks, involving both naturalistic social media data and constructed examples, we investigate the extent to which BERT, RoBERTa, GPT-2 and GPT-3 exhibit general, human-like, knowledge of these common words. We ask: 1) Do the models distinguish amongst the three semantic categories of MODALITY, FREQUENCY and DEGREE? 2) Do they have implicit representations of full scales from maximally negative to maximally positive? 3) How do word frequency and contextual factors impact model performance? We find that despite capturing some aspects of logical meaning, the models fall far short of human performance.Comment: Published in BlackBoxNLP workshop, EMNLP 202

arXiv.org e-Print Archive

Generating multimedia presentations: from plain text to screenplay

Author: A. Jameson
A. Rubinstein
B. Krenn
C. Hartshorne
E. Reiter
E. .H. Hovy
G. Nunberg
H. Kamp
H. Levesque
J. Bateman
J. Kuppevelt Van
K. Deemter Van
N. Bouayad-Agha
P. Grice
Paraboni
R. Dale
W. Wahlster
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

In many Natural Language Generation (NLG) applications, the output is limited to plain text – i.e., a string of words with punctuation and paragraph breaks, but no indications for layout, or pictures, or dialogue. In several projects, we have begun to explore NLG applications in which these extra media are brought into play. This paper gives an informal account of what we have learned. For coherence, we focus on the domain of patient information leaflets, and follow an example in which the same content is expressed first in plain text, then in formatted text, then in text with pictures, and finally in a dialogue script that can be performed by two animated agents. We show how the same meaning can be mapped to realisation patterns in different media, and how the expanded options for expressing meaning are related to the perceived style and tone of the presentation. Throughout, we stress that the extra media are not simple added to plain text, but integrated with it: thus the use of formatting, or pictures, or dialogue, may require radical rewording of the text itself

Crossref

Open Research Online (The Open University)

A Survey of Paraphrasing and Textual Entailment Methods

Author: Androutsopoulos Ion
Malakasiotis Prodromos
Publication venue: 'AI Access Foundation'
Publication date: 30/05/2010
Field of study

Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

arXiv.org e-Print Archive

Crossref

A negation detection assessment of GPTs: analysis with the xNot360 dataset

Author: Goebel Randy
Nguyen Ha Thanh
Satoh Ken
Stathis Kostas
Toni Francesca
Publication venue
Publication date: 28/06/2023
Field of study

Negation is a fundamental aspect of natural language, playing a critical role in communication and comprehension. Our study assesses the negation detection performance of Generative Pre-trained Transformer (GPT) models, specifically GPT-2, GPT-3, GPT-3.5, and GPT-4. We focus on the identification of negation in natural language using a zero-shot prediction approach applied to our custom xNot360 dataset. Our approach examines sentence pairs labeled to indicate whether the second sentence negates the first. Our findings expose a considerable performance disparity among the GPT models, with GPT-4 surpassing its counterparts and GPT-3.5 displaying a marked performance reduction. The overall proficiency of the GPT models in negation detection remains relatively modest, indicating that this task pushes the boundaries of their natural language understanding capabilities. We not only highlight the constraints of GPT models in handling negation but also emphasize the importance of logical reliability in high-stakes domains such as healthcare, science, and law

arXiv.org e-Print Archive