68 research outputs found

    ( E

    Full text link

    DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

    Full text link
    In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for managing computational costs when scaling up model parameters. However, conventional MoE architectures like GShard, which activate the top-KK out of NN experts, face challenges in ensuring expert specialization, i.e. each expert acquires non-overlapping and focused knowledge. In response, we propose the DeepSeekMoE architecture towards ultimate expert specialization. It involves two principal strategies: (1) finely segmenting the experts into mNmN ones and activating mKmK from them, allowing for a more flexible combination of activated experts; (2) isolating KsK_s experts as shared ones, aiming at capturing common knowledge and mitigating redundancy in routed experts. Starting from a modest scale with 2B parameters, we demonstrate that DeepSeekMoE 2B achieves comparable performance with GShard 2.9B, which has 1.5 times the expert parameters and computation. In addition, DeepSeekMoE 2B nearly approaches the performance of its dense counterpart with the same number of total parameters, which set the upper bound of MoE models. Subsequently, we scale up DeepSeekMoE to 16B parameters and show that it achieves comparable performance with LLaMA2 7B, with only about 40% of computations. Further, our preliminary efforts to scale up DeepSeekMoE to 145B parameters consistently validate its substantial advantages over the GShard architecture, and show its performance comparable with DeepSeek 67B, using only 28.5% (maybe even 18.2%) of computations

    Predicting In Vivo Anti-Hepatofibrotic Drug Efficacy Based on In Vitro High-Content Analysis

    Get PDF
    Background/Aims Many anti-fibrotic drugs with high in vitro efficacies fail to produce significant effects in vivo. The aim of this work is to use a statistical approach to design a numerical predictor that correlates better with in vivo outcomes. Methods High-content analysis (HCA) was performed with 49 drugs on hepatic stellate cells (HSCs) LX-2 stained with 10 fibrotic markers. ~0.3 billion feature values from all cells in >150,000 images were quantified to reflect the drug effects. A systematic literature search on the in vivo effects of all 49 drugs on hepatofibrotic rats yields 28 papers with histological scores. The in vivo and in vitro datasets were used to compute a single efficacy predictor (Epredict). Results We used in vivo data from one context (CCl4 rats with drug treatments) to optimize the computation of Epredict. This optimized relationship was independently validated using in vivo data from two different contexts (treatment of DMN rats and prevention of CCl4 induction). A linear in vitro-in vivo correlation was consistently observed in all the three contexts. We used Epredict values to cluster drugs according to efficacy; and found that high-efficacy drugs tended to target proliferation, apoptosis and contractility of HSCs. Conclusions The Epredict statistic, based on a prioritized combination of in vitro features, provides a better correlation between in vitro and in vivo drug response than any of the traditional in vitro markers considered.Institute of Bioengineering and Nanotechnology (Singapore)Singapore. Biomedical Research CouncilSingapore. Agency for Science, Technology and ResearchSingapore-MIT Alliance for Research and Technology Center (C-185-000-033-531)Janssen Cilag (R-185-000-182-592)Singapore-MIT Alliance Computational and Systems Biology Flagship Project (C-382-641-001-091)Mechanobiology Institute, Singapore (R-714-001-003-271

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Full text link
    The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5

    Using the Modified Resistivity–Porosity Cross Plot Method to Identify Formation Fluid Types in Tight Sandstone with Variable Water Salinity

    No full text
    It is generally difficult to identify fluid types in low-porosity and low-permeability reservoirs, and the Chang 8 Member in the Ordos Basin is a typical example. In the Chang 8 Member of Yanchang Formation in the Zhenyuan area of Ordos Basin, affected by lithology and physical properties, the resistivity of the oil layer and water layer are close, which brings great difficulties to fluid type identification. In this paper, we first analyzed the geological and petrophysical characteristics of the study area, and found that high clay content is one of the reasons for the low-resistivity oil pay layer. Then, the formation water types and characteristics of formation water salinity were studied. The water type was mainly CaCl2, and formation water salinity had a great difference in the study area ranging from 7510 ppm to 72,590 ppm, which is the main cause of the low-resistivity oil pay layer. According to the reservoir fluid logging response characteristics, the water saturation boundary of the oil layer, oil–water layer and water layer were determined to be 30%, 65% and 80%, respectively. We modified the traditional resistivity–porosity cross plot method based on Archie’s equations, and established three basic plates with variable formation water salinity, respectively. The above method was used to identify the fluid types of the reservoirs, and the application results indicate that the modified method agrees well with the perforation test data, which can effectively improve the accuracy of fluid identification. The accuracy of the plate is 88.1%. The findings of this study can help for a better understanding of fluid identification and formation evaluation

    Ultra-large elongation and dislocation behavior of nano-sized tantalum single crystals

    No full text
    Although extensive simulations and experimental investigations have been carried out, the plastic deformation mechanism of body-centered-cubic (BCC) metals is still unclear. With our home-made device, the in situ tensile tests of single crystal tantalum (Ta) nanoplates with a lateral dimension of ∼200 nm in width and ∼100 nm in thickness were conducted inside a transmission electron microscope. We discovered an unusual ambient temperature (below ∼60°C) ultra-large elongation which could be as large as 63% on Ta nanoplates. The in situ observations revealed that the continuous and homogeneous dislocation nucleation and fast dislocation escape lead to the ultra-large elongation in BCC Ta nanoplates. Besides commonly believed screw dislocations, a large amount of mixed dislocation with b=12 were also found during the tensile loading, indicating the dislocation process can be significantly influenced by the small sizes of BCC metals. These results provide basic understanding of plastic deformation in BCC metallic nanomaterials
    corecore