184 research outputs found

    A Comparative Study of Subject Pro-drop in Old Chinese and Modern Chinese

    Get PDF

    Extrinsic Factors Affecting the Accuracy of Biomedical NER

    Full text link
    Biomedical named entity recognition (NER) is a critial task that aims to identify structured information in clinical text, which is often replete with complex, technical terms and a high degree of variability. Accurate and reliable NER can facilitate the extraction and analysis of important biomedical information, which can be used to improve downstream applications including the healthcare system. However, NER in the biomedical domain is challenging due to limited data availability, as the high expertise, time, and expenses are required to annotate its data. In this paper, by using the limited data, we explore various extrinsic factors including the corpus annotation scheme, data augmentation techniques, semi-supervised learning and Brill transformation, to improve the performance of a NER model on a clinical text dataset (i2b2 2012, \citet{sun-rumshisky-uzuner:2013}). Our experiments demonstrate that these approaches can significantly improve the model's F1 score from original 73.74 to 77.55. Our findings suggest that considering different extrinsic factors and combining these techniques is a promising approach for improving NER performance in the biomedical domain where the size of data is limited

    Positional Encoding-based Resident Identification in Multi-resident Smart Homes

    Full text link
    We propose a novel resident identification framework to identify residents in a multi-occupant smart environment. The proposed framework employs a feature extraction model based on the concepts of positional encoding. The feature extraction model considers the locations of homes as a graph. We design a novel algorithm to build such graphs from layout maps of smart environments. The Node2Vec algorithm is used to transform the graph into high-dimensional node embeddings. A Long Short-Term Memory (LSTM) model is introduced to predict the identities of residents using temporal sequences of sensor events with the node embeddings. Extensive experiments show that our proposed scheme effectively identifies residents in a multi-occupant environment. Evaluation results on two real-world datasets demonstrate that our proposed approach achieves 94.5% and 87.9% accuracy, respectively.Comment: 27 pages, 11 figures, 2 table

    Economic Dispatch of an Integrated Microgrid Based on the Dynamic Process of CCGT Plant

    Full text link
    Intra-day economic dispatch of an integrated microgrid is a fundamental requirement to integrate distributed generators. The dynamic energy flows in cogeneration units present challenges to the energy management of the microgrid. In this paper, a novel approximate dynamic programming (ADP) approach is proposed to solve this problem based on value function approximation, which is distinct with the consideration of the dynamic process constraints of the combined-cycle gas turbine (CCGT) plant. First, we mathematically formulate the multi-time periods decision problem as a finite-horizon Markov decision process. To deal with the thermodynamic process, an augmented state vector of CCGT is introduced. Second, the proposed VFA-ADP algorithm is employed to derive the near-optimal real-time operation strategies. In addition, to guarantee the monotonicity of piecewise linear function, we apply the SPAR algorithm in the update process. To validate the effectiveness of the proposed method, we conduct experiments with comparisons to some traditional optimization methods. The results indicate that our proposed ADP method achieves better performance on the economic dispatch of the microgrid.Comment: This paper has won the Zhang Si-Ying (CCDC) Outstanding Youth Paper Award in the 33 rd Chinese Control and Decision Conference (CCDC 2021

    Towards Low-Latency Batched Stream Processing by Pre-Scheduling

    Get PDF

    A Deep-learning Real-time Bias Correction Method for Significant Wave Height Forecasts in the Western North Pacific

    Full text link
    Significant wave height is one of the most important parameters characterizing ocean waves, and accurate numerical ocean wave forecasting is crucial for coastal protection and shipping. However, due to the randomness and nonlinearity of the wind fields that generate ocean waves and the complex interaction between wave and wind fields, current forecasts of numerical ocean waves have biases. In this study, a spatiotemporal deep-learning method was employed to correct gridded SWH forecasts from the ECMWF-IFS. This method was built on the trajectory gated recurrent unit deep neural network,and it conducts real-time rolling correction for the 0-240h SWH forecasts from ECMWF-IFS. The correction model is co-driven by wave and wind fields, providing better results than those based on wave fields alone. A novel pixel-switch loss function was developed. The pixel-switch loss function can dynamically fine-tune the pre-trained correction model, focusing on pixels with large biases in SWH forecasts. According to the seasonal characteristics of SWH, four correction models were constructed separately, for spring, summer, autumn, and winter. The experimental results show that, compared with the original ECMWF SWH predictions, the correction was most effective in spring, when the mean absolute error decreased by 12.972~46.237%. Although winter had the worst performance, the mean absolute error decreased by 13.794~38.953%. The corrected results improved the original ECMWF SWH forecasts under both normal and extreme weather conditions, indicating that our SWH correction model is robust and generalizable.Comment: 21 page

    TWIN: TWo-stage Interest Network for Lifelong User Behavior Modeling in CTR Prediction at Kuaishou

    Full text link
    Life-long user behavior modeling, i.e., extracting a user's hidden interests from rich historical behaviors in months or even years, plays a central role in modern CTR prediction systems. Conventional algorithms mostly follow two cascading stages: a simple General Search Unit (GSU) for fast and coarse search over tens of thousands of long-term behaviors and an Exact Search Unit (ESU) for effective Target Attention (TA) over the small number of finalists from GSU. Although efficient, existing algorithms mostly suffer from a crucial limitation: the \textit{inconsistent} target-behavior relevance metrics between GSU and ESU. As a result, their GSU usually misses highly relevant behaviors but retrieves ones considered irrelevant by ESU. In such case, the TA in ESU, no matter how attention is allocated, mostly deviates from the real user interests and thus degrades the overall CTR prediction accuracy. To address such inconsistency, we propose \textbf{TWo-stage Interest Network (TWIN)}, where our Consistency-Preserved GSU (CP-GSU) adopts the identical target-behavior relevance metric as the TA in ESU, making the two stages twins. Specifically, to break TA's computational bottleneck and extend it from ESU to GSU, or namely from behavior length 10210^2 to length 104−10510^4-10^5, we build a novel attention mechanism by behavior feature splitting. For the video inherent features of a behavior, we calculate their linear projection by efficient pre-computing \& caching strategies. And for the user-item cross features, we compress each into a one-dimentional bias term in the attention score calculation to save the computational cost. The consistency between two stages, together with the effective TA-based relevance metric in CP-GSU, contributes to significant performance gain in CTR prediction.Comment: Accepted by KDD 202

    Research on accessibility of port collection and distribution system from the perspective of carbon emissions

    Get PDF
    Port accessibility is an important factor in the efficiency of a port collection and distribution systems. And the carbon emission of the collection and transportation system is large, which is an important factor that cannot be ignored when constructing the collection and transportation system. In order to analyze the carbon emission characteristics of the port collection and distribution system, the paper incorporates the carbon emission factor into the accessibility measurement of the port collection and distribution system. To solve the problem of unbalanced demand of each logistics node, the distribution of logistics demand in the system is realized by the method based on the appropriate freight volume. The carbon emission cost factor is introduced, and the accessibility measurement model based on the generalized cost impedance function is constructed. Taking the collection and distribution system of Douala Port in West Africa as an example to verify, the results show that, after adding the carbon emission factor, the accessibility of each logistics node shows different degrees of decline which shows that the addition of the carbon emission factor can be more comprehensive and can reflect the accessibility of the system

    CMB: A Comprehensive Medical Benchmark in Chinese

    Full text link
    Large Language Models (LLMs) provide a possibility to make a great breakthrough in medicine. The establishment of a standardized medical benchmark becomes a fundamental cornerstone to measure progression. However, medical environments in different regions have their local characteristics, e.g., the ubiquity and significance of traditional Chinese medicine within China. Therefore, merely translating English-based medical evaluation may result in \textit{contextual incongruities} to a local region. To solve the issue, we propose a localized medical benchmark called CMB, a Comprehensive Medical Benchmark in Chinese, designed and rooted entirely within the native Chinese linguistic and cultural framework. While traditional Chinese medicine is integral to this evaluation, it does not constitute its entirety. Using this benchmark, we have evaluated several prominent large-scale LLMs, including ChatGPT, GPT-4, dedicated Chinese LLMs, and LLMs specialized in the medical domain. It is worth noting that our benchmark is not devised as a leaderboard competition but as an instrument for self-assessment of model advancements. We hope this benchmark could facilitate the widespread adoption and enhancement of medical LLMs within China. Check details in \url{https://cmedbenchmark.llmzoo.com/}
    • …
    corecore