7 research outputs found
MiChao-HuaFen 1.0: A Specialized Pre-trained Corpus Dataset for Domain-specific Large Models
With the advancement of deep learning technologies, general-purpose large
models such as GPT-4 have demonstrated exceptional capabilities across various
domains. Nevertheless, there remains a demand for high-quality, domain-specific
outputs in areas like healthcare, law, and finance. This paper first evaluates
the existing large models for specialized domains and discusses their
limitations. To cater to the specific needs of certain domains, we introduce
the ``MiChao-HuaFen 1.0'' pre-trained corpus dataset, tailored for the news and
governmental sectors. The dataset, sourced from publicly available internet
data from 2022, underwent multiple rounds of cleansing and processing to ensure
high quality and reliable origins, with provisions for consistent and stable
updates. This dataset not only supports the pre-training of large models for
Chinese vertical domains but also aids in propelling deep learning research and
applications in related fields.Comment: 4 pages,2 figure
Impact of sequential (first- to third-generation) EGFR-TKI treatment on corrected QT interval in NSCLC patients
ObjectiveTo evaluate the impact of sequential (first- to third-generation) epidermal growth factor receptor tyrosine kinase inhibitor (EGFR-TKI) treatment on top-corrected QT interval (top-QTc) in non-small cell lung cancer (NSCLC) patients.MethodsWe retrospectively reviewed the medical records of NSCLC patients undergoing sequential EGFR-TKI treatment at Shanghai Chest Hospital between October 2016 and August 2021. The heart rate (HR), top-QT interval, and top-QTc of their ECGs were extracted from the institutional database and analyzed. Logistic regression was performed to identify predictors for top-QTc prolongation.ResultsOverall, 228 patients were enrolled. Compared with baseline (median, 368 ms, same below), both first-generation (376 ms vs. 368 ms, p < 0.001) and sequential third-generation EGFR-TKIs (376 ms vs. 368 ms, p = 0.002) prolonged top-QT interval to a similar extent (p = 0.635). Top-QTc (438 ms vs. 423 ms, p < 0.001) and HR (81 bpm vs.79 bpm, p = 0.008) increased after first-generation EGFR-TKI treatment. Further top-QTc prolongation (453 ms vs. 438 ms, p < 0.001) and HR increase (88 bpm vs. 81 bpm, p < 0.001) occurred after treatment advanced. Notably, as HR elevated during treatment, top-QT interval paradoxically increased rather than decreased, and the top-QTc increased rather than slightly fluctuated. Moreover, such phenomena were more significant after treatment advanced. After adjusting for confounding factors, pericardial effusion and lower serum potassium levels were independent predictors of additional QTc prolongation during sequential third-generation EGFR-TKI treatment.ConclusionFirst-generation EGFR-TKI could prolong top-QTc, and sequential third-generation EGFR-TKI induced further prolongation. Top-QT interval paradoxically increased and top-QTc significantly increased as HR elevated, which was more significant after sequential EGFR-TKI treatment. Pericardial effusion and lower serum potassium levels were independent predictors of additional QTc prolongation after sequential EGFR-TKI treatment
Transcriptome-Wide Identification and Expression Profiling of SPX Domain-Containing Members in Responses to Phosphorus Deprivation of Pinus massoniana
The SPX domain-encoding proteins are believed to play important roles in phosphorus (Pi) homeostasis and signal transduction in plants. However, the overall information and responses of SPXs to phosphorus deficiency in pines, remain undefined. In this study, we screened the transcriptome data of Pinus massoniana in response to phosphorus deprivation. Ten SPX domain-containing genes were identified. Based on the conserved domains, the P. massoniana SPX genes were divided into four different subfamilies: SPX, SPX-MFS, SPX-EXS, and SPX-RING. RNA-seq analysis revealed that PmSPX genes were differentially expressed in response to phosphorus deprivation. Furthermore, real-time quantitative PCR (RT-qPCR) showed that PmSPX1 and PmSPX4 showed different expression patterns in different tissues under phosphorus stress. The promoter sequence of 2284 bp upstream of PmSPX1 was obtained by the genome walking method. A cis-element analysis indicated that there were several phosphorus stress response-related elements (e.g., two P1BS elements, a PHO element, and a W-box) in the promoter of PmSPX1. In addition, the previously obtained PmSPX2 promoter sequence contained a W-box, and it was shown that PmWRKY75 could directly bind to the PmSPX2 promoter using yeast one-hybrid analysis in this study. These results presented here revealed the foundational functions of PmSPXs in maintaining plant phosphorus homeostasis
Transcriptome-Wide Identification and Expression Profiling of SPX Domain-Containing Members in Responses to Phosphorus Deprivation of <i>Pinus massoniana</i>
The SPX domain-encoding proteins are believed to play important roles in phosphorus (Pi) homeostasis and signal transduction in plants. However, the overall information and responses of SPXs to phosphorus deficiency in pines, remain undefined. In this study, we screened the transcriptome data of Pinus massoniana in response to phosphorus deprivation. Ten SPX domain-containing genes were identified. Based on the conserved domains, the P. massoniana SPX genes were divided into four different subfamilies: SPX, SPX-MFS, SPX-EXS, and SPX-RING. RNA-seq analysis revealed that PmSPX genes were differentially expressed in response to phosphorus deprivation. Furthermore, real-time quantitative PCR (RT-qPCR) showed that PmSPX1 and PmSPX4 showed different expression patterns in different tissues under phosphorus stress. The promoter sequence of 2284 bp upstream of PmSPX1 was obtained by the genome walking method. A cis-element analysis indicated that there were several phosphorus stress response-related elements (e.g., two P1BS elements, a PHO element, and a W-box) in the promoter of PmSPX1. In addition, the previously obtained PmSPX2 promoter sequence contained a W-box, and it was shown that PmWRKY75 could directly bind to the PmSPX2 promoter using yeast one-hybrid analysis in this study. These results presented here revealed the foundational functions of PmSPXs in maintaining plant phosphorus homeostasis
InternLM2 Technical Report
The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has
sparked discussions on the advent of Artificial General Intelligence (AGI).
However, replicating such advancements in open-source models has been
challenging. This paper introduces InternLM2, an open-source LLM that
outperforms its predecessors in comprehensive evaluations across 6 dimensions
and 30 benchmarks, long-context modeling, and open-ended subjective evaluations
through innovative pre-training and optimization techniques. The pre-training
process of InternLM2 is meticulously detailed, highlighting the preparation of
diverse data types including text, code, and long-context data. InternLM2
efficiently captures long-term dependencies, initially trained on 4k tokens
before advancing to 32k tokens in pre-training and fine-tuning stages,
exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test.
InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel
Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF)
strategy that addresses conflicting human preferences and reward hacking. By
releasing InternLM2 models in different training stages and model sizes, we
provide the community with insights into the model's evolution