97 research outputs found

    COOL, a Context Outlooker, and its Application to Question Answering and other Natural Language Processing Tasks

    Full text link
    Vision outlookers improve the performance of vision transformers, which implement a self-attention mechanism by adding outlook attention, a form of local attention. In natural language processing, as has been the case in computer vision and other domains, transformer-based models constitute the state-of-the-art for most processing tasks. In this domain, too, many authors have argued and demonstrated the importance of local context. We present and evaluate an outlook attention mechanism, COOL, for natural language processing. COOL adds, on top of the self-attention layers of a transformer-based model, outlook attention layers that encode local syntactic context considering word proximity and consider more pair-wise constraints than dynamic convolution operations used by existing approaches. A comparative empirical performance evaluation of an implementation of COOL with different transformer-based approaches confirms the opportunity of improvement over a baseline using the neural language models alone for various natural language processing tasks, including question answering. The proposed approach is competitive with state-of-the-art methods

    From Static to Dynamic: A Continual Learning Framework for Large Language Models

    Full text link
    The vast number of parameters in large language models (LLMs) endows them with remarkable capabilities, allowing them to excel in a variety of natural language processing tasks. However, this complexity also presents challenges, making LLMs difficult to train and inhibiting their ability to continuously assimilate new knowledge, which may lead to inaccuracies in their outputs. To mitigate these issues, this paper presents DynaMind, a novel continual learning framework designed for LLMs. DynaMind incorporates memory mechanisms to assimilate new knowledge and modular operators to enhance the model inference process with the newly assimilated knowledge, consequently improving the accuracies of LLMs' outputs. Benchmark experiments demonstrate DynaMind's effectiveness in overcoming these challenges. The code and demo of DynaMind are available on GitHub: https://github.com/Elfsong/DynaMind

    A correlated motif approach for finding short linear motifs from protein interaction networks

    Get PDF
    BACKGROUND: An important class of interaction switches for biological circuits and disease pathways are short binding motifs. However, the biological experiments to find these binding motifs are often laborious and expensive. With the availability of protein interaction data, novel binding motifs can be discovered computationally: by applying standard motif extracting algorithms on protein sequence sets each interacting with either a common protein or a protein group with similar properties. The underlying assumption is that proteins with common interacting partners will share some common binding motifs. Although novel binding motifs have been discovered with such approach, it is not applicable if a protein interacts with very few other proteins or when prior knowledge of protein group is not available or erroneous. Experimental noise in input interaction data can further deteriorate the dismal performance of such approaches. RESULTS: We propose a novel approach of finding correlated short sequence motifs from protein-protein interaction data to effectively circumvent the above-mentioned limitations. Correlated motifs are those motifs that consistently co-occur only in pairs of interacting protein sequences, and could possibly interact with each other directly or indirectly to mediate interactions. We adopted the (l, d)-motif model and formulate finding the correlated motifs as an (l, d)-motif pair finding problem. We present both an exact algorithm, D-MOTIF, as well as its approximation algorithm, D-STAR to solve this problem. Evaluation on extensive simulated data showed that our approach not only eliminated the need for any prior protein grouping, but is also more robust in extracting motifs from noisy interaction data. Application on two biological datasets (SH3 interaction network and TGFβ signaling network) demonstrates that the approach can extract correlated motifs that correspond to actual interacting subsequences. CONCLUSION: The correlated motif approach outlined in this paper is able to find correlated linear motifs from sparse and noisy interaction data. This, in turn, will expedite the discovery of novel linear binding motifs, and facilitate the studies of biological pathways mediated by them

    Solving Math Word Problems with Reexamination

    Full text link
    Math word problem (MWP) solving aims to understand the descriptive math problem and calculate the result, for which previous efforts are mostly devoted to upgrade different technical modules. This paper brings a different perspective of \textit{reexamination process} during training by introducing a pseudo-dual task to enhance the MWP solving. We propose a pseudo-dual (PseDual) learning scheme to model such process, which is model-agnostic thus can be adapted to any existing MWP solvers. The pseudo-dual task is specifically defined as filling the numbers in the expression back into the original word problem with numbers masked. To facilitate the effective joint learning of the two tasks, we further design a scheduled fusion strategy for the number infilling task, which smoothly switches the input from the ground-truth math expressions to the predicted ones. Our pseudo-dual learning scheme has been tested and proven effective when being equipped in several representative MWP solvers through empirical studies. \textit{The codes and trained models are available at:} \url{https://github.com/steven640pixel/PsedualMWP}. \end{abstract}Comment: To be appeared at NeurIPS2023 Workshop on MATH-A

    Privacy beyond single sensitive attribute

    Get PDF

    Fair yet Asymptotically Equal Collaborative Learning

    Full text link
    In collaborative learning with streaming data, nodes (e.g., organizations) jointly and continuously learn a machine learning (ML) model by sharing the latest model updates computed from their latest streaming data. For the more resourceful nodes to be willing to share their model updates, they need to be fairly incentivized. This paper explores an incentive design that guarantees fairness so that nodes receive rewards commensurate to their contributions. Our approach leverages an explore-then-exploit formulation to estimate the nodes' contributions (i.e., exploration) for realizing our theoretically guaranteed fair incentives (i.e., exploitation). However, we observe a "rich get richer" phenomenon arising from the existing approaches to guarantee fairness and it discourages the participation of the less resourceful nodes. To remedy this, we additionally preserve asymptotic equality, i.e., less resourceful nodes achieve equal performance eventually to the more resourceful/"rich" nodes. We empirically demonstrate in two settings with real-world streaming data: federated online incremental learning and federated reinforcement learning, that our proposed approach outperforms existing baselines in fairness and learning performance while remaining competitive in preserving equality.Comment: Accepted to 40th International Conference on Machine Learning (ICML 2023), 37 page
    corecore