114,665 research outputs found
Detecting and Preventing Hallucinations in Large Vision Language Models
Instruction tuned Large Vision Language Models (LVLMs) have made significant
advancements in generalizing across a diverse set of multimodal tasks,
especially for Visual Question Answering (VQA). However, generating detailed
responses that are visually grounded is still a challenging task for these
models. We find that even the current state-of-the-art LVLMs (InstructBLIP)
still contain a staggering 30 percent of hallucinatory text in the form of
non-existent objects, unfaithful descriptions, and inaccurate relationships. To
address this, we introduce M-HalDetect, a {M}ultimodal {Hal}lucination
{Detect}ion Dataset that can be used to train and benchmark models for
hallucination detection and prevention. M-HalDetect consists of 16k
fine-grained labels on VQA examples, making it the first comprehensive
multi-modal hallucination detection dataset for detailed image descriptions.
Unlike previous work that only consider object hallucination, we additionally
annotate both entity descriptions and relationships that are unfaithful. To
demonstrate the potential of this dataset for preference alignment, we propose
fine-grained Direct Preference Optimization, as well as train fine-grained
multi-modal reward models and evaluate their effectiveness with best-of-n
rejection sampling. We perform human evaluation on both DPO and rejection
sampling, and find that they reduce hallucination rates by 41% and 55%
respectively, a significant improvement over the baseline.Comment: preprin
Textual Economy through Close Coupling of Syntax and Semantics
We focus on the production of efficient descriptions of objects, actions and
events. We define a type of efficiency, textual economy, that exploits the
hearer's recognition of inferential links to material elsewhere within a
sentence. Textual economy leads to efficient descriptions because the material
that supports such inferences has been included to satisfy independent
communicative goals, and is therefore overloaded in Pollack's sense. We argue
that achieving textual economy imposes strong requirements on the
representation and reasoning used in generating sentences. The representation
must support the generator's simultaneous consideration of syntax and
semantics. Reasoning must enable the generator to assess quickly and reliably
at any stage how the hearer will interpret the current sentence, with its
(incomplete) syntax and semantics. We show that these representational and
reasoning requirements are met in the SPUD system for sentence planning and
realization.Comment: 10 pages, uses QobiTree.te
Recommended from our members
Automating class definitions from OWL to English
Text definitions for entities within bio-ontologies are a cor-nerstone of the effort to gain a consensus in understanding and usage of those ontologies. Writing these definitions is, however, a considerable effort and there is often a lag be-tween specification of the entities in the ontology and the development of the text-based definitions. As well as these text definitions, there can also be logical descriptions and definitions of an ontology's entities. The goal of natural lan-guage generation (NLG) from ontologies is to take the logi-cal description of entities and generate fluent natural lan-guage. We should be able to use NLG to automatically pro-vide text-based definitions from an ontology that has logical descriptions of its entities and thus avoid the bottleneck of authoring these definitions by hand. In this paper we present some early work in using NLG to provide such text definitions for the Experimental factor Ontology (EFO). We present our results, discuss issues in generating text definitions, and highlight some future work
Metamodel-based model conformance and multiview consistency checking
Model-driven development, using languages such as UML and BON, often makes use of multiple diagrams (e.g., class and sequence diagrams) when modeling systems. These diagrams, presenting different views of a system of interest, may be inconsistent. A metamodel provides a unifying framework in which to ensure and check consistency, while at the same time providing the means to distinguish between valid and invalid models, that is, conformance. Two formal specifications of the metamodel for an object-oriented modeling language are presented, and it is shown how to use these specifications for model conformance and multiview consistency checking. Comparisons are made in terms of completeness and the level of automation each provide for checking multiview consistency and model conformance. The lessons learned from applying formal techniques to the problems of metamodeling, model conformance, and multiview consistency checking are summarized
Syn-QG: Syntactic and Shallow Semantic Rules for Question Generation
Question Generation (QG) is fundamentally a simple syntactic transformation;
however, many aspects of semantics influence what questions are good to form.
We implement this observation by developing Syn-QG, a set of transparent
syntactic rules leveraging universal dependencies, shallow semantic parsing,
lexical resources, and custom rules which transform declarative sentences into
question-answer pairs. We utilize PropBank argument descriptions and VerbNet
state predicates to incorporate shallow semantic content, which helps generate
questions of a descriptive nature and produce inferential and semantically
richer questions than existing systems. In order to improve syntactic fluency
and eliminate grammatically incorrect questions, we employ back-translation
over the output of these syntactic rules. A set of crowd-sourced evaluations
shows that our system can generate a larger number of highly grammatical and
relevant questions than previous QG systems and that back-translation
drastically improves grammaticality at a slight cost of generating irrelevant
questions.Comment: Some of the results in the paper were incorrec
Generating natural language specifications from UML class diagrams
Early phases of software development are known to be problematic, difficult to manage and errors occurring during these phases are expensive to correct. Many systems have been developed to aid the transition from informal Natural Language requirements to semistructured or formal specifications. Furthermore, consistency checking is seen by many software engineers as the solution to reduce the number of errors occurring during the software development life cycle and allow early verification and validation of software systems. However, this is confined to the models developed during analysis and design and fails to include the early Natural Language requirements. This excludes proper user involvement and creates a gap between the original requirements and the updated and modified models and implementations of the system. To improve this process, we propose a system that generates Natural Language specifications from UML class diagrams. We first investigate the variation of the input language used in naming the components of a class diagram based on the study of a large number of examples from the literature and then develop rules for removing ambiguities in the subset of Natural Language used within UML. We use WordNet,a linguistic ontology, to disambiguate the lexical structures of the UML string names and generate semantically sound sentences. Our system is developed in Java and is tested on an independent though academic case study
Clinical data wrangling using Ontological Realism and Referent Tracking
Ontological realism aims at the development of high quality ontologies that faithfully represent what is general in reality and to use these ontologies to render heterogeneous data collections comparable. To achieve this second goal for clinical research datasets presupposes not merely (1) that the requisite ontologies already exist, but also (2) that the datasets in question are faithful to reality in the dual sense that (a) they denote only particulars and relationships between particulars that do in fact exist and (b) they do this in terms of the types and type-level relationships described in these ontologies. While much attention has been devoted to (1), work on (2), which is the topic of this paper, is comparatively rare. Using Referent Tracking as basis, we describe a technical data wrangling strategy which consists in creating for each dataset a template that, when applied to each particular record in the
dataset, leads to the generation of a collection of Referent Tracking Tuples (RTT) built out of unique identifiers for the entities described by means of the data items in the record. The proposed strategy is based on (i) the distinction between data and what data are about, and (ii) the explicit descriptions of portions of reality which RTTs provide and which range not only over the particulars described by data items in a dataset, but also over these data items themselves. This last feature allows us to describe particulars that are only implicitly referred to by the dataset; to provide information about correspondences between data items in a dataset; and to assert
which data items are unjustifiably or redundantly present in or absent from the dataset. The approach has been tested on a dataset collected from patients seeking treatment for orofacial pain at two German universities and made available for the
NIDCR-funded OPMQoL project
- …