4,698 research outputs found
Experiments to investigate the utility of nearest neighbour metrics based on linguistically informed features for detecting textual plagiarism
Plagiarism detection is a challenge for linguistic models — most current implemented models use simple occurrence statistics for linguistic items. In this paper we report two experiments related to plagiarism detection where we use a model for distributional semantics and of sentence stylistics to compare sentence by sentence the likelihood of a text being partly plagiarised. The result of the comparison are displayed for visual inspection by a plagiarism assessor
On Explaining Multimodal Hateful Meme Detection Models
Hateful meme detection is a new multimodal task that has gained significant
traction in academic and industry research communities. Recently, researchers
have applied pre-trained visual-linguistic models to perform the multimodal
classification task, and some of these solutions have yielded promising
results. However, what these visual-linguistic models learn for the hateful
meme classification task remains unclear. For instance, it is unclear if these
models are able to capture the derogatory or slurs references in multimodality
(i.e., image and text) of the hateful memes. To fill this research gap, this
paper propose three research questions to improve our understanding of these
visual-linguistic models performing the hateful meme classification task. We
found that the image modality contributes more to the hateful meme
classification task, and the visual-linguistic models are able to perform
visual-text slurs grounding to a certain extent. Our error analysis also shows
that the visual-linguistic models have acquired biases, which resulted in
false-positive predictions
Linguistically Grounded Models of Language Change
Questions related to the evolution of language have recently known an
impressive increase of interest (Briscoe, 2002). This short paper aims at
questioning the scientific status of these models and their relations to
attested data. We show that one cannot directly model non-linguistic factors
(exogenous factors) even if they play a crucial role in language evolution. We
then examine the relation between linguistic models and attested language data,
as well as their contribution to cognitive linguistics
Automatic extraction of linguistic models for image description
This paper describes a methodology to extract fuzzy models that describe linguistically the low-level features of an image (such as color, texture, etc.). The methodology combines grid-based algorithms with clustering and tabular simplification methods to compress image information into a small number of fuzzy rules with high linguistic meaning. All the steps of the methodology are carried out with the help offered by the tools of Xfuzzy 3 environment, so we can define, simplify, tune and verify the fuzzy models automatically. Several examples are included to illustrate the advantages of the methodolog
Large Linguistic Models: Analyzing theoretical linguistic abilities of LLMs
The performance of large language models (LLMs) has recently improved to the
point where the models can perform well on many language tasks. We show here
that for the first time, the models can also generate coherent and valid formal
analyses of linguistic data and illustrate the vast potential of large language
models for analyses of their metalinguistic abilities. LLMs are primarily
trained on language data in the form of text; analyzing and evaluating their
metalinguistic abilities improves our understanding of their general
capabilities and sheds new light on theoretical models in linguistics. In this
paper, we probe into GPT-4's metalinguistic capabilities by focusing on three
subfields of formal linguistics: syntax, phonology, and semantics. We outline a
research program for metalinguistic analyses of large language models, propose
experimental designs, provide general guidelines, discuss limitations, and
offer future directions for this line of research. This line of inquiry also
exemplifies behavioral interpretability of deep learning, where models'
representations are accessed by explicit prompting rather than internal
representations
Digital Stylometry: Linking Profiles Across Social Networks
There is an ever growing number of users with accounts on multiple social
media and networking sites. Consequently, there is increasing interest in
matching user accounts and profiles across different social networks in order
to create aggregate profiles of users. In this paper, we present models for
Digital Stylometry, which is a method for matching users through stylometry
inspired techniques. We experimented with linguistic, temporal, and combined
temporal-linguistic models for matching user accounts, using standard and novel
techniques. Using publicly available data, our best model, a combined
temporal-linguistic one, was able to correctly match the accounts of 31% of
5,612 distinct users across Twitter and Facebook.Comment: SocInfo'15, Beijing, China. In proceedings of the 7th International
Conference on Social Informatics (SocInfo 2015). Beijing, Chin
Analysis criteria of logic and linguistic models of natural language sentences
Для здійснення змістовного аналізу електронних текстових документів запропоновано використовувати
формальні логіко-лінгвістичні моделі. Метою статті є опис критеріїв аналізу формальних моделей, що здатні
відображати зміст речень природної мови та формуються з використанням математичного апарату логіки предикатів.
Описані критерії аналізу логіко-лінгвістичних моделей необхідні для побудови формальних моделей електронних текстових
документів.The article describes the main text models used today as a tool for content processing electronic text documents. To
make a content analysis author proposes to use formal logic and linguistic models, which are based on functional relationships
between the principal and subordinate parts of natural language sentences. The article is to describe the criteria for analysis of
formal models that can reflect the content of natural language sentences and which are formed using mathematical tools of predicate
logic. For this purpose, the study researches principles of construction of logic and linguistic models of natural language sentences
and formulates four criteria of analysis. First criterion analyzes the number of simple predicates in logic and linguistic model that
helps to identify information about the type and composition of natural language sentences. The second criterion analyzes potency of
set of predicate variables and constants of logic and linguistic model, which affects the number of simple predicates and identifies
the type of individual forms of logic and linguistic model. The third criterion focuses on the analysis of logical operations that used
in logic and linguistic model. That makes it possible to analyze the sequence of considerations referred to the natural language
sentence. The forth criterion examines the presence of identical components in logic and linguistic models of natural language
sentences from different sets of predicate variables and constants. Described analysis criteria of logic and linguistic models required
to build formal models of electronic text documents using the mathematical apparatus of predicate logic
Intelligent fuzzy controller for event-driven real time systems
Most of the known linguistic models are essentially static, that is, time is not a parameter in describing the behavior of the object's model. In this paper we show a model for synchronous finite state machines based on fuzzy logic. Such finite state machines can be used to build both event-driven, time-varying, rule-based systems and the control unit section of a fuzzy logic computer. The architecture of a pipelined intelligent fuzzy controller is presented, and the linguistic model is represented by an overall fuzzy relation stored in a single rule memory. A VLSI integrated circuit implementation of the fuzzy controller is suggested. At a clock rate of 30 MHz, the controller can perform 3 MFLIPS on multi-dimensional fuzzy data
Recognizing People by Body Shape Using Deep Networks of Images and Words
Common and important applications of person identification occur at distances
and viewpoints in which the face is not visible or is not sufficiently resolved
to be useful. We examine body shape as a biometric across distance and
viewpoint variation. We propose an approach that combines standard object
classification networks with representations based on linguistic (word-based)
descriptions of bodies. Algorithms with and without linguistic training were
compared on their ability to identify people from body shape in images captured
across a large range of distances/views (close-range, 100m, 200m, 270m, 300m,
370m, 400m, 490m, 500m, 600m, and at elevated pitch in images taken by an
unmanned aerial vehicle [UAV]). Accuracy, as measured by identity-match ranking
and false accept errors in an open-set test, was surprisingly good. For
identity-ranking, linguistic models were more accurate for close-range images,
whereas non-linguistic models fared better at intermediary distances. Fusion of
the linguistic and non-linguistic embeddings improved performance at all, but
the farthest distance. Although the non-linguistic model yielded fewer false
accepts at all distances, fusion of the linguistic and non-linguistic models
decreased false accepts for all, but the UAV images. We conclude that
linguistic and non-linguistic representations of body shape can offer
complementary identity information for bodies that can improve identification
in applications of interest.Comment: 9 pages, 5 figures, 4 table
- …