610 research outputs found

    Fault-Tolerant Learning for Term Extraction

    Get PDF

    A computational approach to the syntax of displacement and the semantics of scope

    Get PDF

    A Rule-based Methodology and Feature-based Methodology for Effect Relation Extraction in Chinese Unstructured Text

    Get PDF
    The Chinese language differs significantly from English, both in lexical representation and grammatical structure. These differences lead to problems in the Chinese NLP, such as word segmentation and flexible syntactic structure. Many conventional methods and approaches in Natural Language Processing (NLP) based on English text are shown to be ineffective when attending to these language specific problems in late-started Chinese NLP. Relation Extraction is an area under NLP, looking to identify semantic relationships between entities in the text. The term “Effect Relation” is introduced in this research to refer to a specific content type of relationship between two entities, where one entity has a certain “effect” on the other entity. In this research project, a case study on Chinese text from Traditional Chinese Medicine (TCM) journal publications is built, to closely examine the forms of Effect Relation in this text domain. This case study targets the effect of a prescription or herb, in treatment of a disease, symptom or body part. A rule-based methodology is introduced in this thesis. It utilises predetermined rules and templates, derived from the characteristics and pattern observed in the dataset. This methodology achieves the F-score of 0.85 in its Named Entity Recognition (NER) module; 0.79 in its Semantic Relationship Extraction (SRE) module; and the overall performance of 0.46. A second methodology taking a feature-based approach is also introduced in this thesis. It views the RE task as a classification problem and utilises mathematical classification model and features consisting of contextual information and rules. It achieves the F-scores of: 0.73 (NER), 0.88 (SRE) and overall performance of 0.41. The role of functional words in the contemporary Chinese language and in relation to the ERs in this research is explored. Functional words have been found to be effective in detecting the complex structure ER entities as rules in the rule-based methodology

    A Factoid Question Answering System for Vietnamese

    Full text link
    In this paper, we describe the development of an end-to-end factoid question answering system for the Vietnamese language. This system combines both statistical models and ontology-based methods in a chain of processing modules to provide high-quality mappings from natural language text to entities. We present the challenges in the development of such an intelligent user interface for an isolating language like Vietnamese and show that techniques developed for inflectional languages cannot be applied "as is". Our question answering system can answer a wide range of general knowledge questions with promising accuracy on a test set.Comment: In the proceedings of the HQA'18 workshop, The Web Conference Companion, Lyon, Franc

    LawBench: Benchmarking Legal Knowledge of Large Language Models

    Full text link
    Large language models (LLMs) have demonstrated strong capabilities in various aspects. However, when applying them to the highly specialized, safe-critical legal domain, it is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks. To address this gap, we propose a comprehensive evaluation benchmark LawBench. LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize needed legal concepts, articles and facts; (2) Legal knowledge understanding: whether LLMs can comprehend entities, events and relationships within legal text; (3) Legal knowledge applying: whether LLMs can properly utilize their legal knowledge and make necessary reasoning steps to solve realistic legal tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label classification (SLC), multi-label classification (MLC), regression, extraction and generation. We perform extensive evaluations of 51 LLMs on LawBench, including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific LLMs. The results show that GPT-4 remains the best-performing LLM in the legal domain, surpassing the others by a significant margin. While fine-tuning LLMs on legal specific text brings certain improvements, we are still a long way from obtaining usable and reliable LLMs in legal tasks. All data, model predictions and evaluation code are released in https://github.com/open-compass/LawBench/. We hope this benchmark provides in-depth understanding of the LLMs' domain-specified capabilities and speed up the development of LLMs in the legal domain
    • …
    corecore