1,822 research outputs found

    In Search of the Long-Tail: Systematic Generation of Long-Tail Knowledge via Logical Rule Guided Search

    Full text link
    Since large language models have approached human-level performance on many tasks, it has become increasingly harder for researchers to find tasks that are still challenging to the models. Failure cases usually come from the long-tail distribution - data that an oracle language model could assign a probability on the lower end of its distribution. Current methodology such as prompt engineering or crowdsourcing are insufficient for creating long-tail examples because humans are constrained by cognitive bias. We propose a Logic-Induced-Knowledge-Search (LINK) framework for systematically generating long-tail knowledge statements. Grounded by a symbolic rule, we search for long-tail values for each variable of the rule by first prompting a LLM, then verifying the correctness of the values with a critic, and lastly pushing for the long-tail distribution with a reranker. With this framework we construct a dataset, Logic-Induced-Long-Tail (LINT), consisting of 200 symbolic rules and 50K knowledge statements spanning across four domains. Human annotations find that 84% of the statements in LINT are factually correct. In contrast, ChatGPT and GPT4 struggle with directly generating long-tail statements under the guidance of logic rules, each only getting 56% and 78% of their statements correct. Moreover, their "long-tail" generations in fact fall into the higher likelihood range, and thus are not really long-tail. Our findings suggest that LINK is effective for generating data in the long-tail distribution while enforcing quality. LINT can be useful for systematically evaluating LLMs' capabilities in the long-tail distribution. We challenge the models with a simple entailment classification task using samples from LINT. We find that ChatGPT and GPT4's capability in identifying incorrect knowledge drop by ~3% in the long-tail distribution compared to head distribution

    Fractional embeddings and stochastic time

    Full text link
    As a model problem for the study of chaotic Hamiltonian systems, we look for the effects of a long-tail distribution of recurrence times on a fixed Hamiltonian dynamics. We follow Stanislavsky's approach of Hamiltonian formalism for fractional systems. We prove that his formalism can be retrieved from the fractional embedding theory. We deduce that the fractional Hamiltonian systems of Stanislavsky stem from a particular least action principle, said causal. In this case, the fractional embedding becomes coherent.Comment: 11 page

    Personalized Federated Learning on Long-Tailed Data via Adversarial Feature Augmentation

    Full text link
    Personalized Federated Learning (PFL) aims to learn personalized models for each client based on the knowledge across all clients in a privacy-preserving manner. Existing PFL methods generally assume that the underlying global data across all clients are uniformly distributed without considering the long-tail distribution. The joint problem of data heterogeneity and long-tail distribution in the FL environment is more challenging and severely affects the performance of personalized models. In this paper, we propose a PFL method called Federated Learning with Adversarial Feature Augmentation (FedAFA) to address this joint problem in PFL. FedAFA optimizes the personalized model for each client by producing a balanced feature set to enhance the local minority classes. The local minority class features are generated by transferring the knowledge from the local majority class features extracted by the global model in an adversarial example learning manner. The experimental results on benchmarks under different settings of data heterogeneity and long-tail distribution demonstrate that FedAFA significantly improves the personalized performance of each client compared with the state-of-the-art PFL algorithm. The code is available at https://github.com/pxqian/FedAFA.Comment: Accepted by ICASSP 202

    Shelf space strategy in long-tail markets

    Full text link
    The Internet is known to have had a powerful impact on on-line retailer strategies in markets characterised by long-tail distribution of sales. Such retailers can exploit the long tail of the market, since they are effectively without physical limit on the number of choices on offer. Here we examine two extensions of this phenomenon. First, we introduce turnover into the long-tail distribution of sales. Although over any given period such as a week or a month, the distribution is right-skewed and often power law distributed, over time there is considerable turnover in the rankings of sales of individual products. Second, we establish some initial results on the implications for shelf-space strategy of physical retailers in such markets.Comment: 10 pages, 3 figure

    A Comment on Nonextensive Statistical Mechanics

    Get PDF
    Abstract. There is a conception that Boltzmann-Gibbs statistics cannot yield the long tail distribution. This is the justification for the intensive research of nonextensive entropies (i.e. Tsallis entropy and others). Here the error that caused this misconception is explained and it is shown that a long tail distribution exists in equilibrium thermodynamics for more than a century.Keywords. Long-tail distribution, Power Law, Zipf Law, Tsallis Entropy.JEL. C62

    M|G|∞ queue busy period length with PME distribution analysis through Laplace transform

    Get PDF
    In this article it is shown that if the busy period of a M|G|∞ queue system is PME distributed, the respective service time is a random variable with a long-tail distribution. The result is obtained through Laplace transforms analysis.info:eu-repo/semantics/acceptedVersio
    • …
    corecore