29 research outputs found

    Exploiting Multiple Embeddings for Chinese Named Entity Recognition

    Full text link
    Identifying the named entities mentioned in text would enrich many semantic applications at the downstream level. However, due to the predominant usage of colloquial language in microblogs, the named entity recognition (NER) in Chinese microblogs experience significant performance deterioration, compared with performing NER in formal Chinese corpus. In this paper, we propose a simple yet effective neural framework to derive the character-level embeddings for NER in Chinese text, named ME-CNER. A character embedding is derived with rich semantic information harnessed at multiple granularities, ranging from radical, character to word levels. The experimental results demonstrate that the proposed approach achieves a large performance improvement on Weibo dataset and comparable performance on MSRA news dataset with lower computational cost against the existing state-of-the-art alternatives.Comment: accepted at CIKM 201

    Skywork: A More Open Bilingual Foundation Model

    Full text link
    In this technical report, we present Skywork-13B, a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts. This bilingual foundation model is the most extensively trained and openly published LLMs of comparable size to date. We introduce a two-stage training methodology using a segmented corpus, targeting general purpose training and then domain-specific enhancement training, respectively. We show that our model not only excels on popular benchmarks, but also achieves \emph{state of the art} performance in Chinese language modeling on diverse domains. Furthermore, we propose a novel leakage detection method, demonstrating that test data contamination is a pressing issue warranting further investigation by the LLM community. To spur future research, we release Skywork-13B along with checkpoints obtained during intermediate stages of the training process. We are also releasing part of our SkyPile corpus, a collection of over 150 billion tokens of web text, which is the largest high quality open Chinese pre-training corpus to date. We hope Skywork-13B and our open corpus will serve as a valuable open-source resource to democratize access to high-quality LLMs

    Oral microbiome and risk of malignant esophageal lesions in a high-risk area of China: A nested case-control study.

    Get PDF
    OBJECTIVE: We aimed to prospectively evaluate the association of oral microbiome with malignant esophageal lesions and its predictive potential as a biomarker of risk. METHODS: We conducted a case-control study nested within a population-based cohort with up to 8 visits of oral swab collection for each subject over an 11-year period in a high-risk area for esophageal cancer in China. The oral microbiome was evaluated with 16S ribosomal RNA (rRNA) gene sequencing in 428 pre-diagnostic oral specimens from 84 cases with esophageal lesions of severe squamous dysplasia and above (SDA) and 168 matched healthy controls. DESeq analysis was performed to identify taxa of differential abundance. Differential oral species together with subject characteristics were evaluated for their potential in predicting SDA risk by constructing conditional logistic regression models. RESULTS: A total of 125 taxa including 37 named species showed significantly different abundance between SDA cases and controls (all P0.84. CONCLUSIONS: The oral microbiome may play an etiological and predictive role in esophageal cancer, and it holds promise as a non-invasive early warning biomarker for risk stratification for esophageal cancer screening programs

    Reliability Modelling of CNC Machine Tools Based on the Improved Maximum Likelihood Estimation Method

    No full text
    The existing standard reliability models for computerized numerical control (CNC) machine tools are not satisfactory and they fall short of predicting failure rates or lifetime of key functional parts of CNC machine tools. This is attributed to two reasons: the small sample size of failure data and a large truncated ratio of the censored failure data. Improved correction method (ICM), maximum likelihood estimation (MLE), and empirical maximum likelihood estimation (EMLE) are presented and compared with each other in this study. In order to improve the shortage of reliability models developed by the traditional methods, an improved maximum likelihood estimation method (IMLE) is proposed through enlarging censored failure data. Moreover, the correction factors of mean ratio to extend censored time are designed, by which the censored failure data can be close to the true time between failures (TBF). Furthermore, a solution method of correction factors considering amount of calculation is proposed to meet the requirements of calculation precision. Finally, verification by the orthogonal experiment is simulated to verify the proposed model. The verifying test results show that the proposed method can be applied in reliability modelling for not only CNC machine tools but also the key functional parts of CNC machine tools

    Effects of Sous Vide Cooking on the Physicochemical and Volatile Flavor Properties of Half-Shell Scallop (<i>Chlamys farreri</i>) during Chilled Storage

    No full text
    This study explored the effects of sous vide (SV) cooking treatments on the physicochemical quality and volatile flavor of half-shell scallop (Chlamys farreri) during 30 d of chilled storage. The vacuum-packed scallop samples were cooked at 70 °C (SV-70) and 75 °C (SV-75) and maintained for 30 min. The samples were compared with the positive control (cooked at 100 °C for 10 min, CK). The results indicate that the total volatile basic nitrogen (TVBN), pH, texture, and malondialdehyde (MDA) content gradually increased, while the myofibrillar protein (MP) extraction rate of the CK, SV-70, and SV-75 samples significantly decreased with increasing chilled storage time. Significantly, the SV cooking treatments maintained a much higher water-holding capacity of scallop muscle, compared with the conventional cooking process at 100 °C. Additionally, the SV-75 cooking treatment maintained relatively stable TVBN, pH, and MDA content, springiness, and shearing force properties of scallop samples, especially during 0–20 d of storage. Volatile flavor analysis showed that a total of 42 volatile organic compounds (VOCs) were detected in the scallop samples, and there were no considerable differences in these VOCs between the CK and SV-75 cooked samples (0 d). Overall, the SV cooking treatments effectively maintained acceptable and stable physicochemical and volatile flavor properties of half-shell scallop samples during chilled storage

    Increased Phenolic Content and Enhanced Antioxidant Activity in Fermented Glutinous Rice Supplemented with Fu Brick Tea

    No full text
    Glutinous rice-based foods have a long history are consumed worldwide. They are also in great demand for the pursuit of novel sensory and natural health benefits. In this study, we developed a novel fermented glutinous rice product with the supplementation of Fu brick tea. Using in vitro antioxidant evaluation and phenolic compounds analysis, fermentation with Fu brick tea increased the total phenolic content and enhanced the antioxidant activity of glutinous rice, including scavenging of 1,1-Diphenyl-2-picryl-hydrazyl (DPPH) radical, 2,2&#8242;-azino-bis-3-ethylbenzthiazoline-6-sulphonic acid (ABTS) radical, and hydroxyl radical, ferric-reducing antioxidant power, and ferric ion reducing power and iron chelating capability. Besides, compared with traditional fermented glutinous rice, this novel functional food exhibited a stronger activity for protecting DNA against hydroxyl radical-induced oxidation damage. Quantitative analysis by HPLC identified 14 compounds covering catechins and phenolic acids, which were considered to be positively related to the enhanced antioxidant capability. Furthermore, we found that 80% ethanol was a suitable extract solvent compared with water, because of its higher extraction efficiency and stronger functional activities. Our results suggested that this novel fermented glutinous rice could serve as a nutraceutical food/ingredient with special sensory and functional activities
    corecore