93 research outputs found
Probing Quantifier Comprehension in Large Language Models
With their increasing size, Large language models (LLMs) are becoming
increasingly good at language understanding tasks. But even with high
performance on specific downstream task, LLMs fail at simple linguistic tests
for negation or quantifier understanding. Previous work on testing capability
of LLMs on understanding quantifiers suggest that as the size of the models
increase, they get better at understanding most-type quantifiers but get
increasingly worse at understanding few-type quantifiers, thus presenting a
case of an inverse-scaling law. In this paper, we question the claims of
inverse scaling of few-type quantifier understanding in LLMs and show that it
is a result of inappropriate testing methodology. We also present alternate
methods to measure quantifier comprehension in LLMs and show that as the size
of the models increase, these behaviours are different from what is shown in
previous research. LLMs are consistently able to understand the difference
between the meaning of few-type and most-type quantifiers, but when a
quantifier is added to phrase, LLMs do not always take into account the meaning
of the quantifier. We in fact see an inverse scaling law for most-type
quantifiers, which is contrary to human psycho-linguistic experiments and
previous work, where the model's understanding of most-type quantifier gets
worse as the model size increases. We do this evaluation on models ranging from
125M-175B parameters, which suggests that LLMs do not do as well as expected
with quantifiers and statistical co-occurrence of words still takes precedence
over word meaning
Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition
Recent advancements in transformer-based speech representation models have
greatly transformed speech processing. However, there has been limited research
conducted on evaluating these models for speech emotion recognition (SER)
across multiple languages and examining their internal representations. This
article addresses these gaps by presenting a comprehensive benchmark for SER
with eight speech representation models and six different languages. We
conducted probing experiments to gain insights into inner workings of these
models for SER. We find that using features from a single optimal layer of a
speech model reduces the error rate by 32\% on average across seven datasets
when compared to systems where features from all layers of speech models are
used. We also achieve state-of-the-art results for German and Persian
languages. Our probing results indicate that the middle layers of speech models
capture the most important emotional information for speech emotion
recognition
Unsupervised Domain Adaptation using Lexical Transformations and Label Injection for Twitter Data
Domain adaptation is an important and widely studied problem in natural
language processing. A large body of literature tries to solve this problem by
adapting models trained on the source domain to the target domain. In this
paper, we instead solve this problem from a dataset perspective. We modify the
source domain dataset with simple lexical transformations to reduce the domain
shift between the source dataset distribution and the target dataset
distribution. We find that models trained on the transformed source domain
dataset performs significantly better than zero-shot models. Using our proposed
transformations to convert standard English to tweets, we reach an unsupervised
part-of-speech (POS) tagging accuracy of 92.14% (from 81.54% zero shot
accuracy), which is only slightly below the supervised performance of 94.45%.
We also use our proposed transformations to synthetically generate tweets and
augment the Twitter dataset to achieve state-of-the-art performance for POS
tagging.Comment: Accepted at WASSA at ACL 202
Pathfinder β an online shopping assistant driven by data mining
Online Shopping is a household phrase that has been extremely successful in easing the lives of many people across the globe. Online shoppers spend ample amounts of time and money in buying products that they receive at their doorstep in a matter of a few days or, in some cases, a few hours. However, it is not as easy as it looks. People either have plenty of specifics in their mind before buying a product or are just looking to explore a range of products for a particular goal. This adds another layer on top of time and money spent β effort.
PathFinder is a guide that helps shoppers make more informed choices and reach their final product decision faster. It is a hand-in-hand assistant for shoppers that helps them at critical stages of the buying process to ensure that they either reach their specifics without too much research or that they get to explore granular details which they would miss otherwise. Being an online shopping assistant, PathFinder aims to reduce the effort spent by online shoppers and eases up the online product purchasing process even further
Route Planning Using Nature-Inspired Algorithms
There are many different heuristic algorithms for solving combinatorial
optimization problems that are commonly described as Nature-Inspired Algorithms
(NIAs). Generally, they are inspired by some natural phenomenon, and due to
their inherent converging and stochastic nature, they are known to give optimal
results when compared to classical approaches. There are a large number of
applications of NIAs, perhaps the most popular being route planning problems in
robotics - problems that require a sequence of translation and rotation steps
from the start to the goal in an optimized manner while avoiding obstacles in
the environment. In this chapter, we will first give an overview of
Nature-Inspired Algorithms, followed by their classification and common
examples. We will then discuss how the NIAs have applied to solve the route
planning problem.Comment: This work is part of 'High-Performance Vision Intelligence'; Part of
the Studies in Computational Intelligence book series (SCI,volume 913) and
can be accessed at:
https://link.springer.com/chapter/10.1007/978-981-15-6844-2_1
Reinforcement learning for zone based multiagent pathfinding under uncertainty
Ministry of Education, Singapore under its Academic Research Funding Tier
Successor features based multi-agent RL for event-based decentralized MDPs
Decentralized MDPs (Dec-MDPs) provide a rigorous framework for collaborative multi-agent sequential decisionmaking under uncertainty. However, their computational complexity limits the practical impact. To address this, we focus on a class of Dec-MDPs consisting of independent collaborating agents that are tied together through a global reward function that depends upon their entire histories of states and actions to accomplish joint tasks. To overcome scalability barrier, our main contributions are: (a) We propose a new actor-critic based Reinforcement Learning (RL) approach for event-based Dec-MDPs using successor features (SF) which is a value function representation that decouples the dynamics of the environment from the rewards; (b) We then present Dec-ESR (Decentralized Event based Successor Representation) which generalizes learning for event-based Dec-MDPs using SF within an end-to-end deep RL framework; (c) We also show that Dec-ESR allows useful transfer of information on related but different tasks, hence bootstraps the learning for faster convergence on new tasks; (d) For validation purposes, we test our approach on a large multi-agent coverage problem which models schedule coordination of agents in a real urban subway network and achieves better quality solutions than previous best approaches
GlyphNet: Homoglyph domains dataset and detection using attention-based Convolutional Neural Networks
Cyber attacks deceive machines into believing something that does not exist
in the first place. However, there are some to which even humans fall prey. One
such famous attack that attackers have used over the years to exploit the
vulnerability of vision is known to be a Homoglyph attack. It employs a primary
yet effective mechanism to create illegitimate domains that are hard to
differentiate from legit ones. Moreover, as the difference is pretty
indistinguishable for a user to notice, they cannot stop themselves from
clicking on these homoglyph domain names. In many cases, that results in either
information theft or malware attack on their systems. Existing approaches use
simple, string-based comparison techniques applied in primary language-based
tasks. Although they are impactful to some extent, they usually fail because
they are not robust to different types of homoglyphs and are computationally
not feasible because of their time requirement proportional to the string
length. Similarly, neural network-based approaches are employed to determine
real domain strings from fake ones. Nevertheless, the problem with both methods
is that they require paired sequences of real and fake domain strings to work
with, which is often not the case in the real world, as the attacker only sends
the illegitimate or homoglyph domain to the vulnerable user. Therefore,
existing approaches are not suitable for practical scenarios in the real world.
In our work, we created GlyphNet, an image dataset that contains 4M domains,
both real and homoglyphs. Additionally, we introduce a baseline method for a
homoglyph attack detection system using an attention-based convolutional Neural
Network. We show that our model can reach state-of-the-art accuracy in
detecting homoglyph attacks with a 0.93 AUC on our dataset
- β¦