30 research outputs found

    A Study of Chinese Named Entity and Relation Identification in a Specific Domain

    Get PDF
    This thesis aims at investigating automatic identification of Chinese named entities (NEs) and their relations (NERs) in a specific domain. We have proposed a three-stage pipeline computational model for the error correction of word segmentation and POS tagging, NE recognition and NER identification. In this model, an error repair module utilizing machine learning techniques is developed in the first stage. At the second stage, a new algorithm that can automatically construct Finite State Cascades (FSC) from given sets of rules is designed. As a supplement, the recognition strategy without NE trigger words can identify the special linguistic phenomena. In the third stage, a novel approach - positive and negative case-based learning and identification (PNCBL&I) is implemented. It pursues the improvement of the identification performance for NERs through simultaneously learning two opposite cases and automatically selecting effective multi-level linguistic features for NERs and non-NERs. Further, two other strategies, resolving relation conflicts and inferring missing relations, are also integrated in the identification procedure.Diese Dissertation ist der Forschung zur automatischen Erkennung von chinesischen Begriffen (named entities, NE) und ihrer Relationen (NER) in einer spezifischen Domäne gewidmet. Wir haben ein Pipelinemodell mit drei aufeinanderfolgenden Verarbeitungsschritten für die Korrektur der Fehler der Wortsegmentation und Wortartmarkierung, NE-Erkennung, und NER-Identifizierung vorgeschlagen. In diesem Modell wird eine Komponente zur Fehlerreparatur im ersten Verarbeitungsschritt verwirklicht, die ein machinelles Lernverfahren einsetzt. Im zweiten Stadium wird ein neuer Algorithmus, der die Kaskaden endlicher Transduktoren aus den Mengen der Regeln automatisch konstruieren kann, entworfen. Zusätzlich kann eine Strategie für die Erkennung von NE, die nicht durch das Vorkommen bestimmer lexikalischer Trigger markiert sind, die spezielle linguistische Phänomene identifizieren. Im dritten Verarbeitungsschritt wird ein neues Verfahren, das auf dem Lernen und der Identifizierung positiver und negativer Fälle beruht, implementiert. Es verfolgt die Verbesserung der NER-Erkennungsleistung durch das gleichzeitige Lernen zweier gegenüberliegenden Fälle und die automatische Auswahl der wirkungsvollen linguistischen Merkmale auf mehreren Ebenen für die NER und Nicht-NER. Weiter werden zwei andere Strategien, die Lösung von Konflikten in der Relationenerkennung und die Inferenz von fehlenden Relationen, auch in den Erkennungsprozeß integriert

    A Study of Chinese Named Entity and Relation Identification in a Specific Domain

    Get PDF
    This thesis aims at investigating automatic identification of Chinese named entities (NEs) and their relations (NERs) in a specific domain. We have proposed a three-stage pipeline computational model for the error correction of word segmentation and POS tagging, NE recognition and NER identification. In this model, an error repair module utilizing machine learning techniques is developed in the first stage. At the second stage, a new algorithm that can automatically construct Finite State Cascades (FSC) from given sets of rules is designed. As a supplement, the recognition strategy without NE trigger words can identify the special linguistic phenomena. In the third stage, a novel approach - positive and negative case-based learning and identification (PNCBL&I) is implemented. It pursues the improvement of the identification performance for NERs through simultaneously learning two opposite cases and automatically selecting effective multi-level linguistic features for NERs and non-NERs. Further, two other strategies, resolving relation conflicts and inferring missing relations, are also integrated in the identification procedure.Diese Dissertation ist der Forschung zur automatischen Erkennung von chinesischen Begriffen (named entities, NE) und ihrer Relationen (NER) in einer spezifischen Domäne gewidmet. Wir haben ein Pipelinemodell mit drei aufeinanderfolgenden Verarbeitungsschritten für die Korrektur der Fehler der Wortsegmentation und Wortartmarkierung, NE-Erkennung, und NER-Identifizierung vorgeschlagen. In diesem Modell wird eine Komponente zur Fehlerreparatur im ersten Verarbeitungsschritt verwirklicht, die ein machinelles Lernverfahren einsetzt. Im zweiten Stadium wird ein neuer Algorithmus, der die Kaskaden endlicher Transduktoren aus den Mengen der Regeln automatisch konstruieren kann, entworfen. Zusätzlich kann eine Strategie für die Erkennung von NE, die nicht durch das Vorkommen bestimmer lexikalischer Trigger markiert sind, die spezielle linguistische Phänomene identifizieren. Im dritten Verarbeitungsschritt wird ein neues Verfahren, das auf dem Lernen und der Identifizierung positiver und negativer Fälle beruht, implementiert. Es verfolgt die Verbesserung der NER-Erkennungsleistung durch das gleichzeitige Lernen zweier gegenüberliegenden Fälle und die automatische Auswahl der wirkungsvollen linguistischen Merkmale auf mehreren Ebenen für die NER und Nicht-NER. Weiter werden zwei andere Strategien, die Lösung von Konflikten in der Relationenerkennung und die Inferenz von fehlenden Relationen, auch in den Erkennungsprozeß integriert

    Semantically intelligent semi-automated ontology integration

    Get PDF
    An ontology is a way of information categorization and storage. Web Ontologies provide help in retrieving the required and precise information over the web. However, the problem of heterogeneity between ontologies may occur in the use of multiple ontologies of the same domain. The integration of ontologies provides a solution for the heterogeneity problem. Ontology integration is a solution to problem of interoperability in the knowledge based systems. Ontology integration provides a mechanism to find the semantic association between a pair of reference ontologies based on their concepts. Many researchers have been working on the problem of ontology integration; however, multiple issues related to ontology integration are still not addressed. This dissertation involves the investigation of the ontology integration problem and proposes a layer based enhanced framework as a solution to the problem. The comparison between concepts of reference ontologies is based on their semantics along with their syntax in the concept matching process of ontology integration. The semantic relationship of a concept with other concepts between ontologies and the provision of user confirmation (only for the problematic cases) are also taken into account in this process. The proposed framework is implemented and validated by providing a comparison of the proposed concept matching technique with the existing techniques. The test case scenarios are provided in order to compare and analyse the proposed framework in the analysis phase. The results of the experiments completed demonstrate the efficacy and success of the proposed framework

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    An Energy-Efficient and Reliable Data Transmission Scheme for Transmitter-based Energy Harvesting Networks

    Get PDF
    Energy harvesting technology has been studied to overcome a limited power resource problem for a sensor network. This paper proposes a new data transmission period control and reliable data transmission algorithm for energy harvesting based sensor networks. Although previous studies proposed a communication protocol for energy harvesting based sensor networks, it still needs additional discussion. Proposed algorithm control a data transmission period and the number of data transmission dynamically based on environment information. Through this, energy consumption is reduced and transmission reliability is improved. The simulation result shows that the proposed algorithm is more efficient when compared with previous energy harvesting based communication standard, Enocean in terms of transmission success rate and residual energy.This research was supported by Basic Science Research Program through the National Research Foundation by Korea (NRF) funded by the Ministry of Education, Science and Technology(2012R1A1A3012227)

    A semi-automated FAQ retrieval system for HIV/AIDS

    Get PDF
    This thesis describes a semi-automated FAQ retrieval system that can be queried by users through short text messages on low-end mobile phones to provide answers on HIV/AIDS related queries. First we address the issue of result presentation on low-end mobile phones by proposing an iterative interaction retrieval strategy where the user engages with the FAQ retrieval system in the question answering process. At each iteration, the system returns only one question-answer pair to the user and the iterative process terminates after the user's information need has been satisfied. Since the proposed system is iterative, this thesis attempts to reduce the number of iterations (search length) between the users and the system so that users do not abandon the search process before their information need has been satisfied. Moreover, we conducted a user study to determine the number of iterations that users are willing to tolerate before abandoning the iterative search process. We subsequently used the bad abandonment statistics from this study to develop an evaluation measure for estimating the probability that any random user will be satisfied when using our FAQ retrieval system. In addition, we used a query log and its click-through data to address three main FAQ document collection deficiency problems in order to improve the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system. Conclusions are derived concerning whether we can reduce the rate at which users abandon their search before their information need has been satisfied by using information from previous searches to: Address the term mismatch problem between the users' SMS queries and the relevant FAQ documents in the collection; to selectively rank the FAQ document according to how often they have been previously identified as relevant by users for a particular query term; and to identify those queries that do not have a relevant FAQ document in the collection. In particular, we proposed a novel template-based approach that uses queries from a query log for which the true relevant FAQ documents are known to enrich the FAQ documents with additional terms in order to alleviate the term mismatch problem. These terms are added as a separate field in a field-based model using two different proposed enrichment strategies, namely the Term Frequency and the Term Occurrence strategies. This thesis thoroughly investigates the effectiveness of the aforementioned FAQ document enrichment strategies using three different field-based models. Our findings suggest that we can improve the overall recall and the probability that any random user will be satisfied by enriching the FAQ documents with additional terms from queries in our query log. Moreover, our investigation suggests that it is important to use an FAQ document enrichment strategy that takes into consideration the number of times a term occurs in the query when enriching the FAQ documents. We subsequently show that our proposed enrichment approach for alleviating the term mismatch problem generalise well on other datasets. Through the evaluation of our proposed approach for selectively ranking the FAQ documents, we show that we can improve the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system by incorporating the click popularity score of a query term t on an FAQ document d into the scoring and ranking process. Our results generalised well on a new dataset. However, when we deploy the click popularity score of a query term t on an FAQ document d on an enriched FAQ document collection, we saw a decrease in the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system. Furthermore, we used our query log to build a binary classifier for detecting those queries that do not have a relevant FAQ document in the collection (Missing Content Queries (MCQs))). Before building such a classifier, we empirically evaluated several feature sets in order to determine the best combination of features for building a model that yields the best classification accuracy in identifying the MCQs and the non-MCQs. Using a different dataset, we show that we can improve the overall retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system by deploying a MCQs detection subsystem in our FAQ retrieval system to filter out the MCQs. Finally, this thesis demonstrates that correcting spelling errors can help improve the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system. We tested our FAQ retrieval system with two different testing sets, one containing the original SMS queries and the other containing the SMS queries which were manually corrected for spelling errors. Our results show a significant improvement in the retrieval performance and the probability that any random user will be satisfied when using our FAQ retrieval system
    corecore