172 research outputs found

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Full text link
    We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models

    Optimal Measurement Planning Using Fuzzy-set Theory

    No full text
    In precision measurement, it is known that a measurement process involves errors or factors of different kinds and types. Using the prior knowledge on the relationship between the measured variables and the factors, the best measurement plan may be obtainable if a target function on the errors is minimized. In many cases, however, the relationship may not be clear since non-quantitative factors are involved, making the finding of the best plan using methods such as the statistics method quite difficult. A new method based the fuzzy-set theory is proposed to solve this problem. In this method, the membership grade is maximized. The concept of quasi-perfect plan is presented. Mathematical modes are established and case studies are presented in order to demonstrate the feasibility of the proposed method

    An Efficient Cross-lingual Model for Sentence Classification Using Convolutional Neural Network

    No full text

    Estimation of Non-statistical Uncertainty Using Fuzzy-set Theory

    No full text
    A novel method using a fuzzy practicable interval to characterize non-statistical uncertainty in dynamic measurement is proposed. The method permits the uncertainty being estimated under the conditions that the number of measurements is very small and the probability distribution unknown. The feasibility of the method is validated by computer-simulation experiments

    Novel technique of fusion for truth value based on poor information

    No full text

    Fuzzy-set-based Optimal Selection of Measurement Plans

    No full text
    In many cases we may have several approaches to realise a precision measurement. It is very important to decide on an optimal measurement approach or plan in order to obtain effectively high-fidelity results. A measurement process may involve many kinds of error factors which vary in different measurement plans. This would make the selection of the most appropriate approach difficult. Currently, prior knowledge of the relationship between measured variables and error factors, and statistical methods are typically used for such selection. In many cases, however, the relationship may not be clear, due to non-quantitative factors being involved. To solve the problem, a new method that is based on fuzzy set theory is proposed. In this method, membership grades are established and grades to the quasi-perfect plan are maximised. Mathematical models are established and the concept of a quasi-perfect measurement plan is proposed. Experimental testing is presented in order to demonstrate the effectiveness of the proposed method
    corecore