29 research outputs found
Exploiting Multiple Embeddings for Chinese Named Entity Recognition
Identifying the named entities mentioned in text would enrich many semantic
applications at the downstream level. However, due to the predominant usage of
colloquial language in microblogs, the named entity recognition (NER) in
Chinese microblogs experience significant performance deterioration, compared
with performing NER in formal Chinese corpus. In this paper, we propose a
simple yet effective neural framework to derive the character-level embeddings
for NER in Chinese text, named ME-CNER. A character embedding is derived with
rich semantic information harnessed at multiple granularities, ranging from
radical, character to word levels. The experimental results demonstrate that
the proposed approach achieves a large performance improvement on Weibo dataset
and comparable performance on MSRA news dataset with lower computational cost
against the existing state-of-the-art alternatives.Comment: accepted at CIKM 201
Skywork: A More Open Bilingual Foundation Model
In this technical report, we present Skywork-13B, a family of large language
models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both
English and Chinese texts. This bilingual foundation model is the most
extensively trained and openly published LLMs of comparable size to date. We
introduce a two-stage training methodology using a segmented corpus, targeting
general purpose training and then domain-specific enhancement training,
respectively. We show that our model not only excels on popular benchmarks, but
also achieves \emph{state of the art} performance in Chinese language modeling
on diverse domains. Furthermore, we propose a novel leakage detection method,
demonstrating that test data contamination is a pressing issue warranting
further investigation by the LLM community. To spur future research, we release
Skywork-13B along with checkpoints obtained during intermediate stages of the
training process. We are also releasing part of our SkyPile corpus, a
collection of over 150 billion tokens of web text, which is the largest high
quality open Chinese pre-training corpus to date. We hope Skywork-13B and our
open corpus will serve as a valuable open-source resource to democratize access
to high-quality LLMs
Oral microbiome and risk of malignant esophageal lesions in a high-risk area of China: A nested case-control study.
OBJECTIVE: We aimed to prospectively evaluate the association of oral microbiome with malignant esophageal lesions and its predictive potential as a biomarker of risk. METHODS: We conducted a case-control study nested within a population-based cohort with up to 8 visits of oral swab collection for each subject over an 11-year period in a high-risk area for esophageal cancer in China. The oral microbiome was evaluated with 16S ribosomal RNA (rRNA) gene sequencing in 428 pre-diagnostic oral specimens from 84 cases with esophageal lesions of severe squamous dysplasia and above (SDA) and 168 matched healthy controls. DESeq analysis was performed to identify taxa of differential abundance. Differential oral species together with subject characteristics were evaluated for their potential in predicting SDA risk by constructing conditional logistic regression models. RESULTS: A total of 125 taxa including 37 named species showed significantly different abundance between SDA cases and controls (all P0.84. CONCLUSIONS: The oral microbiome may play an etiological and predictive role in esophageal cancer, and it holds promise as a non-invasive early warning biomarker for risk stratification for esophageal cancer screening programs
Recommended from our members
Joint analysis of three genome-wide association studies of esophageal squamous cell carcinoma in Chinese populations
We conducted a joint (pooled) analysis of three genome-wide association studies (GWAS) 1-3 of esophageal squamous cell carcinoma (ESCC) in ethnic Chinese (5,337 ESCC cases and 5,787 controls) with 9,654 ESCC cases and 10,058 controls for follow-up. In a logistic regression model adjusted for age, sex, study, and two eigenvectors, two new loci achieved genome-wide significance, marked by rs7447927 at 5q31.2 (per-allele odds ratio (OR) = 0.85, 95% CI 0.82-0.88; P=7.72x10−20) and rs1642764 at 17p13.1 (per-allele OR= 0.88, 95% CI 0.85-0.91; P=3.10x10−13). rs7447927 is a synonymous single nucleotide polymorphism (SNP) in TMEM173 and rs1642764 is an intronic SNP in ATP1B2, near TP53. Furthermore, a locus in the HLA class II region at 6p21.32 (rs35597309) achieved genome-wide significance in the two populations at highest risk for ESSC (OR=1.33, 95% CI 1.22-1.46; P=1.99x10−10). Our joint analysis identified new ESCC susceptibility loci overall as well as a new locus unique to the ESCC high risk Taihang Mountain region
Characterization of fermented Okara powder and its effect on lipid oxidation of emulsion-type sausage pork sausage during cold storage *Corresponding author: 13
Abstract 21 In the present study, Okara, a soybean by-product from the production of tofu an
Reliability Modelling of CNC Machine Tools Based on the Improved Maximum Likelihood Estimation Method
The existing standard reliability models for computerized numerical control (CNC) machine tools are not satisfactory and they fall short of predicting failure rates or lifetime of key functional parts of CNC machine tools. This is attributed to two reasons: the small sample size of failure data and a large truncated ratio of the censored failure data. Improved correction method (ICM), maximum likelihood estimation (MLE), and empirical maximum likelihood estimation (EMLE) are presented and compared with each other in this study. In order to improve the shortage of reliability models developed by the traditional methods, an improved maximum likelihood estimation method (IMLE) is proposed through enlarging censored failure data. Moreover, the correction factors of mean ratio to extend censored time are designed, by which the censored failure data can be close to the true time between failures (TBF). Furthermore, a solution method of correction factors considering amount of calculation is proposed to meet the requirements of calculation precision. Finally, verification by the orthogonal experiment is simulated to verify the proposed model. The verifying test results show that the proposed method can be applied in reliability modelling for not only CNC machine tools but also the key functional parts of CNC machine tools
Effects of Sous Vide Cooking on the Physicochemical and Volatile Flavor Properties of Half-Shell Scallop (<i>Chlamys farreri</i>) during Chilled Storage
This study explored the effects of sous vide (SV) cooking treatments on the physicochemical quality and volatile flavor of half-shell scallop (Chlamys farreri) during 30 d of chilled storage. The vacuum-packed scallop samples were cooked at 70 °C (SV-70) and 75 °C (SV-75) and maintained for 30 min. The samples were compared with the positive control (cooked at 100 °C for 10 min, CK). The results indicate that the total volatile basic nitrogen (TVBN), pH, texture, and malondialdehyde (MDA) content gradually increased, while the myofibrillar protein (MP) extraction rate of the CK, SV-70, and SV-75 samples significantly decreased with increasing chilled storage time. Significantly, the SV cooking treatments maintained a much higher water-holding capacity of scallop muscle, compared with the conventional cooking process at 100 °C. Additionally, the SV-75 cooking treatment maintained relatively stable TVBN, pH, and MDA content, springiness, and shearing force properties of scallop samples, especially during 0–20 d of storage. Volatile flavor analysis showed that a total of 42 volatile organic compounds (VOCs) were detected in the scallop samples, and there were no considerable differences in these VOCs between the CK and SV-75 cooked samples (0 d). Overall, the SV cooking treatments effectively maintained acceptable and stable physicochemical and volatile flavor properties of half-shell scallop samples during chilled storage
Increased Phenolic Content and Enhanced Antioxidant Activity in Fermented Glutinous Rice Supplemented with Fu Brick Tea
Glutinous rice-based foods have a long history are consumed worldwide. They are also in great demand for the pursuit of novel sensory and natural health benefits. In this study, we developed a novel fermented glutinous rice product with the supplementation of Fu brick tea. Using in vitro antioxidant evaluation and phenolic compounds analysis, fermentation with Fu brick tea increased the total phenolic content and enhanced the antioxidant activity of glutinous rice, including scavenging of 1,1-Diphenyl-2-picryl-hydrazyl (DPPH) radical, 2,2′-azino-bis-3-ethylbenzthiazoline-6-sulphonic acid (ABTS) radical, and hydroxyl radical, ferric-reducing antioxidant power, and ferric ion reducing power and iron chelating capability. Besides, compared with traditional fermented glutinous rice, this novel functional food exhibited a stronger activity for protecting DNA against hydroxyl radical-induced oxidation damage. Quantitative analysis by HPLC identified 14 compounds covering catechins and phenolic acids, which were considered to be positively related to the enhanced antioxidant capability. Furthermore, we found that 80% ethanol was a suitable extract solvent compared with water, because of its higher extraction efficiency and stronger functional activities. Our results suggested that this novel fermented glutinous rice could serve as a nutraceutical food/ingredient with special sensory and functional activities