Search CORE

68 research outputs found

( E

Author: Amal Raj
Changquan Deng
Deli
Huang
Huang
Jie Zheng
Min Xie
Sheldrick
Yu-Lin Zhu
Zhu
Publication venue: 'International Union of Crystallography (IUCr)'
Publication date
Field of study

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Author: Chen Deli
Dai Damai
Deng Chengqi
Gao Huazuo
Huang Panpan
Li Jiashi
Li Y. K.
Liang Wenfeng
Luo Fuli
Ruan Chong
Sui Zhifang
Wu Y.
Xie Zhenda
Xu R. X.
Yu Xingkai
Zeng Wangding
Zhao Chenggang
Publication venue
Publication date: 11/01/2024
Field of study

In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for managing computational costs when scaling up model parameters. However, conventional MoE architectures like GShard, which activate the top-

K

out of

N

experts, face challenges in ensuring expert specialization, i.e. each expert acquires non-overlapping and focused knowledge. In response, we propose the DeepSeekMoE architecture towards ultimate expert specialization. It involves two principal strategies: (1) finely segmenting the experts into

mN

ones and activating

mK

from them, allowing for a more flexible combination of activated experts; (2) isolating

K_s

experts as shared ones, aiming at capturing common knowledge and mitigating redundancy in routed experts. Starting from a modest scale with 2B parameters, we demonstrate that DeepSeekMoE 2B achieves comparable performance with GShard 2.9B, which has 1.5 times the expert parameters and computation. In addition, DeepSeekMoE 2B nearly approaches the performance of its dense counterpart with the same number of total parameters, which set the upper bound of MoE models. Subsequently, we scale up DeepSeekMoE to 16B parameters and show that it achieves comparable performance with LLaMA2 7B, with only about 40% of computations. Further, our preliminary efforts to scale up DeepSeekMoE to 145B parameters consistently validate its substantial advantages over the GShard architecture, and show its performance comparable with DeepSeek 67B, using only 28.5% (maybe even 18.2%) of computations

arXiv.org e-Print Archive

Predicting In Vivo Anti-Hepatofibrotic Drug Efficacy Based on In Vitro High-Content Analysis

Author: A Pares
A Regev
AC de Gouville
AM Gressner
AM Jonker
B Schnabl
Baixue Zheng
C Raetsch
CJ Marek
CL Lai
CZ Chen
D Kershenobich
DC Rockey
DH Jeong
EC Butcher
F Mion
F Oakley
FJ Massey
GJ Yuan
GP Aithal
GS Li
H Yamaguchi
Hanry Yu
HL Bonkowsky
I Tasci
I Tasci
J Soma
JJ Maher
JM Dumont
L Dai
Lisa Tucker-Kellogg
Looling Tan
LW Chong
MA Hashem
Maria A. Deli
MC Zhen
MG Bachem
MK Lee
P Ferenci
P Langguth
Peter T. C. So
Q Hu
Q Xu
Roy E. Welsch
RT Hong
S Sale
S Singh
SJ Lee
SL Friedman
SW Schalm
TG van Rossum
TR Morgan
W Yu
Weimiao Yu
WJ Conover
XL Wu
Xuejun Mo
XY Zhao
Yan Wang
YC Hsu
ZY Deng
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/05/2011
Field of study

Background/Aims Many anti-fibrotic drugs with high in vitro efficacies fail to produce significant effects in vivo. The aim of this work is to use a statistical approach to design a numerical predictor that correlates better with in vivo outcomes. Methods High-content analysis (HCA) was performed with 49 drugs on hepatic stellate cells (HSCs) LX-2 stained with 10 fibrotic markers. ~0.3 billion feature values from all cells in >150,000 images were quantified to reflect the drug effects. A systematic literature search on the in vivo effects of all 49 drugs on hepatofibrotic rats yields 28 papers with histological scores. The in vivo and in vitro datasets were used to compute a single efficacy predictor (Epredict). Results We used in vivo data from one context (CCl4 rats with drug treatments) to optimize the computation of Epredict. This optimized relationship was independently validated using in vivo data from two different contexts (treatment of DMN rats and prevention of CCl4 induction). A linear in vitro-in vivo correlation was consistently observed in all the three contexts. We used Epredict values to cluster drugs according to efficacy; and found that high-efficacy drugs tended to target proliferation, apoptosis and contractility of HSCs. Conclusions The Epredict statistic, based on a prioritized combination of in vitro features, provides a better correlation between in vitro and in vivo drug response than any of the traditional in vitro markers considered.Institute of Bioengineering and Nanotechnology (Singapore)Singapore. Biomedical Research CouncilSingapore. Agency for Science, Technology and ResearchSingapore-MIT Alliance for Research and Technology Center (C-185-000-033-531)Janssen Cilag (R-185-000-182-592)Singapore-MIT Alliance Computational and Systems Biology Flagship Project (C-382-641-001-091)Mechanobiology Institute, Singapore (R-714-001-003-271

Public Library of Science (PLOS)

DSpace@MIT

Crossref

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Author: :
Bi Xiao
Chen Deli
Chen Guanting
Chen Shanhuang
Dai Damai
DeepSeek-AI
Deng Chengqi
Ding Honghui
Dong Kai
Du Qiushi
Fu Zhe
Gao Huazuo
Gao Kaige
Gao Wenjun
Ge Ruiqi
Guan Kang
Guo Daya
Guo Jianzhong
Hao Guangbo
Hao Zhewen
He Ying
Hu Wenjie
Huang Panpan
Li Erhang
Li Guowei
Li Jiashi
Li Y. K.
Li Yao
Liang Wenfeng
Lin Fangyun
Liu A. X.
Liu Bo
Liu Wen
Liu Xiaodong
Liu Xin
Liu Yiyuan
Lu Haoyu
Lu Shanghao
Luo Fuli
Ma Shirong
Nie Xiaotao
Pei Tian
Piao Yishi
Qiu Junjie
Qu Hui
Ren Tongzheng
Ren Zehui
Ruan Chong
Sha Zhangli
Shao Zhihong
Song Junxiao
Su Xuecheng
Sun Jingxiang
Sun Yaofeng
Tang Minghui
Wang Bingxuan
Wang Peiyi
Wang Shiyu
Wang Yaohui
Wang Yongji
Wu Tong
Wu Y.
Xie Xin
Xie Zhenda
Xie Ziwei
Xiong Yiliang
Xu Hanwei
Xu R. X.
Xu Yanhong
Yang Dejian
You Yuxiang
Yu Shuiping
Yu Xingkai
Zhang B.
Zhang Haowei
Zhang Lecong
Zhang Liyue
Zhang Mingchuan
Zhang Minghua
Zhang Wentao
Zhang Yichao
Zhao Chenggang
Zhao Yao
Zhou Shangyan
Zhou Shunfeng
Zhu Qihao
Zou Yuheng
Publication venue
Publication date: 05/01/2024
Field of study

The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5

arXiv.org e-Print Archive

Chilling acclimation provides immunity to stress by altering regulatory networks and inducing genes with protective functions in Cassava

Author: A Chawade
AJ Carroll
B Langmead
B Usadel
Bin Wang
C Deisenroth
C Li
C Trapnell
C-H Dong
C-H Dong
Changying Zeng
CJ Doherty
D An
Deli Deng
DM Goodstein
DR Hoagland
EJ Wurtmann
F Kong
G Koehler
GS Sanghera
Hai Peng
HI McKhann
HP Harding
J Levitt
J Medina
JB Smirnova
JC Preston
Jing Xia
Junfei Zhou
K Miura
K-Y Yun
Kevin Zhang
KP Bhat
L Zhang
LM Iyer
LR Costello
M Chen
M Ishitani
M-S Dai
MA Carvallo
MA El-Sharkawy
MA El-Sharkawy
MA Leyva-González
MF Thomashow
Ming Peng
ML Falcone Ferreyra
MQ Le
MV Gerashchenko
N Fahlgren
N Provart
OV Fursova
P Casati
Q Zhang
Q-H Zhu
R Shinkawa
R Sunkar
R Sunkar
R Uberto
S Bursać
S Kidokoro
Shun Song
SV Kumar
T Grousl
T Zhang
V Chinnusamy
V Chinnusamy
V Haake
W Yamori
Weiping Bo
Weixiong Zhang
Wenquan Wang
X Zhang
Xin Chen
Xin Guo
Y Benjamini
Y Choi
Y Guo
Yufei Zhou
Z Gong
Z Yang
Zheng Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Algorithms and performance analysis for narrowband internet of things (NB-IoT) and broadband LTE coexisting system

Author: Deng Yansha
Imran Muhammad
Qiao Deli
Yang Bowen
Zhang Lei
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

No abstract available

Enlighten

Recent Progress of Palladium-Based Electrocatalysts for the Formic Acid Oxidation Reaction

Author: Deli Wang
Jingjing Zhang
Ke Chen
Shaofeng Deng
Tao Shen
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref

An intuitionistic fuzzy soft set method for stochastic decision-making applying prospect theory and grey relational analysis

Author: Aktas
Atanassov
Basu
Cağman
Cağman
Cağman
Cağman
Deli
Deli
Deli
Deng
Feng
Feng
Jiang
Li
Li
Li
Li
Li
Maji
Maji
Molodtsov
Pawlak
Rottenstreich
Salminen
Tamura
Tversky
Xie
Zadeh
Publication venue: 'IOS Press'
Publication date
Field of study

Crossref

Using the Modified Resistivity–Porosity Cross Plot Method to Identify Formation Fluid Types in Tight Sandstone with Variable Water Salinity

Author: Deli Li
Hucheng Deng
Jianhua He
Kesai Li
Yuanyuan Wang
Yufei Yang
Zehou Xiang
Publication venue: 'MDPI AG'
Publication date: 01/10/2021
Field of study

It is generally difficult to identify fluid types in low-porosity and low-permeability reservoirs, and the Chang 8 Member in the Ordos Basin is a typical example. In the Chang 8 Member of Yanchang Formation in the Zhenyuan area of Ordos Basin, affected by lithology and physical properties, the resistivity of the oil layer and water layer are close, which brings great difficulties to fluid type identification. In this paper, we first analyzed the geological and petrophysical characteristics of the study area, and found that high clay content is one of the reasons for the low-resistivity oil pay layer. Then, the formation water types and characteristics of formation water salinity were studied. The water type was mainly CaCl2, and formation water salinity had a great difference in the study area ranging from 7510 ppm to 72,590 ppm, which is the main cause of the low-resistivity oil pay layer. According to the reservoir fluid logging response characteristics, the water saturation boundary of the oil layer, oil–water layer and water layer were determined to be 30%, 65% and 80%, respectively. We modified the traditional resistivity–porosity cross plot method based on Archie’s equations, and established three basic plates with variable formation water salinity, respectively. The above method was used to identify the fluid types of the reservoirs, and the application results indicate that the modified method agrees well with the perforation test data, which can effectively improve the accuracy of fluid identification. The accuracy of the plate is 88.1%. The findings of this study can help for a better understanding of fluid identification and formation evaluation

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Ultra-large elongation and dislocation behavior of nano-sized tantalum single crystals

Author: Chen Yanhui
Deng Qingsong
Kong Deli
Lu Yan
Ma Ying
Shu Xinyu
Wang Lihua
Zhou Hao
Zou Jin
Publication venue: 'AIP Publishing'
Publication date: 01/04/2017
Field of study

Although extensive simulations and experimental investigations have been carried out, the plastic deformation mechanism of body-centered-cubic (BCC) metals is still unclear. With our home-made device, the in situ tensile tests of single crystal tantalum (Ta) nanoplates with a lateral dimension of ∼200 nm in width and ∼100 nm in thickness were conducted inside a transmission electron microscope. We discovered an unusual ambient temperature (below ∼60°C) ultra-large elongation which could be as large as 63% on Ta nanoplates. The in situ observations revealed that the continuous and homogeneous dislocation nucleation and fast dislocation escape lead to the ultra-large elongation in BCC Ta nanoplates. Besides commonly believed screw dislocations, a large amount of mixed dislocation with b=12 were also found during the tensile loading, indicating the dislocation process can be significantly influenced by the small sizes of BCC metals. These results provide basic understanding of plastic deformation in BCC metallic nanomaterials

Directory of Open Access Journals

University of Queensland eSpace