1,822 research outputs found
In Search of the Long-Tail: Systematic Generation of Long-Tail Knowledge via Logical Rule Guided Search
Since large language models have approached human-level performance on many
tasks, it has become increasingly harder for researchers to find tasks that are
still challenging to the models. Failure cases usually come from the long-tail
distribution - data that an oracle language model could assign a probability on
the lower end of its distribution. Current methodology such as prompt
engineering or crowdsourcing are insufficient for creating long-tail examples
because humans are constrained by cognitive bias. We propose a
Logic-Induced-Knowledge-Search (LINK) framework for systematically generating
long-tail knowledge statements. Grounded by a symbolic rule, we search for
long-tail values for each variable of the rule by first prompting a LLM, then
verifying the correctness of the values with a critic, and lastly pushing for
the long-tail distribution with a reranker. With this framework we construct a
dataset, Logic-Induced-Long-Tail (LINT), consisting of 200 symbolic rules and
50K knowledge statements spanning across four domains. Human annotations find
that 84% of the statements in LINT are factually correct. In contrast, ChatGPT
and GPT4 struggle with directly generating long-tail statements under the
guidance of logic rules, each only getting 56% and 78% of their statements
correct. Moreover, their "long-tail" generations in fact fall into the higher
likelihood range, and thus are not really long-tail. Our findings suggest that
LINK is effective for generating data in the long-tail distribution while
enforcing quality. LINT can be useful for systematically evaluating LLMs'
capabilities in the long-tail distribution. We challenge the models with a
simple entailment classification task using samples from LINT. We find that
ChatGPT and GPT4's capability in identifying incorrect knowledge drop by ~3% in
the long-tail distribution compared to head distribution
Fractional embeddings and stochastic time
As a model problem for the study of chaotic Hamiltonian systems, we look for
the effects of a long-tail distribution of recurrence times on a fixed
Hamiltonian dynamics. We follow Stanislavsky's approach of Hamiltonian
formalism for fractional systems. We prove that his formalism can be retrieved
from the fractional embedding theory. We deduce that the fractional Hamiltonian
systems of Stanislavsky stem from a particular least action principle, said
causal. In this case, the fractional embedding becomes coherent.Comment: 11 page
Personalized Federated Learning on Long-Tailed Data via Adversarial Feature Augmentation
Personalized Federated Learning (PFL) aims to learn personalized models for
each client based on the knowledge across all clients in a privacy-preserving
manner. Existing PFL methods generally assume that the underlying global data
across all clients are uniformly distributed without considering the long-tail
distribution. The joint problem of data heterogeneity and long-tail
distribution in the FL environment is more challenging and severely affects the
performance of personalized models. In this paper, we propose a PFL method
called Federated Learning with Adversarial Feature Augmentation (FedAFA) to
address this joint problem in PFL. FedAFA optimizes the personalized model for
each client by producing a balanced feature set to enhance the local minority
classes. The local minority class features are generated by transferring the
knowledge from the local majority class features extracted by the global model
in an adversarial example learning manner. The experimental results on
benchmarks under different settings of data heterogeneity and long-tail
distribution demonstrate that FedAFA significantly improves the personalized
performance of each client compared with the state-of-the-art PFL algorithm.
The code is available at https://github.com/pxqian/FedAFA.Comment: Accepted by ICASSP 202
Shelf space strategy in long-tail markets
The Internet is known to have had a powerful impact on on-line retailer
strategies in markets characterised by long-tail distribution of sales. Such
retailers can exploit the long tail of the market, since they are effectively
without physical limit on the number of choices on offer. Here we examine two
extensions of this phenomenon. First, we introduce turnover into the long-tail
distribution of sales. Although over any given period such as a week or a
month, the distribution is right-skewed and often power law distributed, over
time there is considerable turnover in the rankings of sales of individual
products. Second, we establish some initial results on the implications for
shelf-space strategy of physical retailers in such markets.Comment: 10 pages, 3 figure
A Comment on Nonextensive Statistical Mechanics
Abstract. There is a conception that Boltzmann-Gibbs statistics cannot yield the long tail distribution. This is the justification for the intensive research of nonextensive entropies (i.e. Tsallis entropy and others). Here the error that caused this misconception is explained and it is shown that a long tail distribution exists in equilibrium thermodynamics for more than a century.Keywords. Long-tail distribution, Power Law, Zipf Law, Tsallis Entropy.JEL. C62
M|G|∞ queue busy period length with PME distribution analysis through Laplace transform
In this article it is shown that if the busy period of a M|G|∞ queue system is PME distributed, the respective service time is a random variable with a long-tail distribution. The result is obtained through Laplace transforms analysis.info:eu-repo/semantics/acceptedVersio
- …