68 research outputs found
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
In the era of large language models, Mixture-of-Experts (MoE) is a promising
architecture for managing computational costs when scaling up model parameters.
However, conventional MoE architectures like GShard, which activate the top-
out of experts, face challenges in ensuring expert specialization, i.e.
each expert acquires non-overlapping and focused knowledge. In response, we
propose the DeepSeekMoE architecture towards ultimate expert specialization. It
involves two principal strategies: (1) finely segmenting the experts into
ones and activating from them, allowing for a more flexible combination of
activated experts; (2) isolating experts as shared ones, aiming at
capturing common knowledge and mitigating redundancy in routed experts.
Starting from a modest scale with 2B parameters, we demonstrate that
DeepSeekMoE 2B achieves comparable performance with GShard 2.9B, which has 1.5
times the expert parameters and computation. In addition, DeepSeekMoE 2B nearly
approaches the performance of its dense counterpart with the same number of
total parameters, which set the upper bound of MoE models. Subsequently, we
scale up DeepSeekMoE to 16B parameters and show that it achieves comparable
performance with LLaMA2 7B, with only about 40% of computations. Further, our
preliminary efforts to scale up DeepSeekMoE to 145B parameters consistently
validate its substantial advantages over the GShard architecture, and show its
performance comparable with DeepSeek 67B, using only 28.5% (maybe even 18.2%)
of computations
Predicting In Vivo Anti-Hepatofibrotic Drug Efficacy Based on In Vitro High-Content Analysis
Background/Aims
Many anti-fibrotic drugs with high in vitro efficacies fail to produce significant effects in vivo. The aim of this work is to use a statistical approach to design a numerical predictor that correlates better with in vivo outcomes.
Methods
High-content analysis (HCA) was performed with 49 drugs on hepatic stellate cells (HSCs) LX-2 stained with 10 fibrotic markers. ~0.3 billion feature values from all cells in >150,000 images were quantified to reflect the drug effects. A systematic literature search on the in vivo effects of all 49 drugs on hepatofibrotic rats yields 28 papers with histological scores. The in vivo and in vitro datasets were used to compute a single efficacy predictor (Epredict).
Results
We used in vivo data from one context (CCl4 rats with drug treatments) to optimize the computation of Epredict. This optimized relationship was independently validated using in vivo data from two different contexts (treatment of DMN rats and prevention of CCl4 induction). A linear in vitro-in vivo correlation was consistently observed in all the three contexts. We used Epredict values to cluster drugs according to efficacy; and found that high-efficacy drugs tended to target proliferation, apoptosis and contractility of HSCs.
Conclusions
The Epredict statistic, based on a prioritized combination of in vitro features, provides a better correlation between in vitro and in vivo drug response than any of the traditional in vitro markers considered.Institute of Bioengineering and Nanotechnology (Singapore)Singapore. Biomedical Research CouncilSingapore. Agency for Science, Technology and ResearchSingapore-MIT Alliance for Research and Technology Center (C-185-000-033-531)Janssen Cilag (R-185-000-182-592)Singapore-MIT Alliance Computational and Systems Biology Flagship Project (C-382-641-001-091)Mechanobiology Institute, Singapore (R-714-001-003-271
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
The rapid development of open-source large language models (LLMs) has been
truly remarkable. However, the scaling law described in previous literature
presents varying conclusions, which casts a dark cloud over scaling LLMs. We
delve into the study of scaling laws and present our distinctive findings that
facilitate scaling of large scale models in two commonly used open-source
configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek
LLM, a project dedicated to advancing open-source language models with a
long-term perspective. To support the pre-training phase, we have developed a
dataset that currently consists of 2 trillion tokens and is continuously
expanding. We further conduct supervised fine-tuning (SFT) and Direct
Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the
creation of DeepSeek Chat models. Our evaluation results demonstrate that
DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in
the domains of code, mathematics, and reasoning. Furthermore, open-ended
evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance
compared to GPT-3.5
Algorithms and performance analysis for narrowband internet of things (NB-IoT) and broadband LTE coexisting system
No abstract available
Using the Modified Resistivity–Porosity Cross Plot Method to Identify Formation Fluid Types in Tight Sandstone with Variable Water Salinity
It is generally difficult to identify fluid types in low-porosity and low-permeability reservoirs, and the Chang 8 Member in the Ordos Basin is a typical example. In the Chang 8 Member of Yanchang Formation in the Zhenyuan area of Ordos Basin, affected by lithology and physical properties, the resistivity of the oil layer and water layer are close, which brings great difficulties to fluid type identification. In this paper, we first analyzed the geological and petrophysical characteristics of the study area, and found that high clay content is one of the reasons for the low-resistivity oil pay layer. Then, the formation water types and characteristics of formation water salinity were studied. The water type was mainly CaCl2, and formation water salinity had a great difference in the study area ranging from 7510 ppm to 72,590 ppm, which is the main cause of the low-resistivity oil pay layer. According to the reservoir fluid logging response characteristics, the water saturation boundary of the oil layer, oil–water layer and water layer were determined to be 30%, 65% and 80%, respectively. We modified the traditional resistivity–porosity cross plot method based on Archie’s equations, and established three basic plates with variable formation water salinity, respectively. The above method was used to identify the fluid types of the reservoirs, and the application results indicate that the modified method agrees well with the perforation test data, which can effectively improve the accuracy of fluid identification. The accuracy of the plate is 88.1%. The findings of this study can help for a better understanding of fluid identification and formation evaluation
Ultra-large elongation and dislocation behavior of nano-sized tantalum single crystals
Although extensive simulations and experimental investigations have been carried out, the plastic deformation mechanism of body-centered-cubic (BCC) metals is still unclear. With our home-made device, the in situ tensile tests of single crystal tantalum (Ta) nanoplates with a lateral dimension of ∼200 nm in width and ∼100 nm in thickness were conducted inside a transmission electron microscope. We discovered an unusual ambient temperature (below ∼60°C) ultra-large elongation which could be as large as 63% on Ta nanoplates. The in situ observations revealed that the continuous and homogeneous dislocation nucleation and fast dislocation escape lead to the ultra-large elongation in BCC Ta nanoplates. Besides commonly believed screw dislocations, a large amount of mixed dislocation with b=12 were also found during the tensile loading, indicating the dislocation process can be significantly influenced by the small sizes of BCC metals. These results provide basic understanding of plastic deformation in BCC metallic nanomaterials
- …