11,207 research outputs found

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Lexicon-corpus Based Korean Unknown Foreign Word Extraction and Updating Using Syllable Identification

    Get PDF
    AbstractThis paper presents an efficient text mining method focusing on extraction and updating of unknown words (unknown foreign words) to improve data classification and POS tags. Proposed methods can also help to improve the accuracy of mining frequent pattern and association rules from unstructured (textual) data. Many researches have been done by numerous scholars on estimation and segmentation for unknown words, but, they are limited to grammatical and linguistic rules with limited vocabulary. In our project we have consider the fact, that no language is free from the influence of foreign languages, especially, country like Korea where there is a rapid improvement in the area of culture and media and the frequent usage of these foreign languages, resulted in mixing up different languages, their style along with slangs and also abbreviated words in daily life and conversation. The main characteristic of our system is to find such unknown foreign words and update them to appropriate words, which depends on available information through dictionaries. We have also explained the essential natural language processing (NLP) tools used for data processing. Our proposed method used simple but efficient techniques, first it converts the data into structured form, using data preprocessing techniques. In this phase data passes through different stages, such as, cleaning, integration and selection of important data, and then it gets organized into databases structure for further analysis and processing. This database consists of different kinds of dictionaries, our system heavily based on dictionaries. We have manually created various kinds of dictionaries for different kinds of unknown foreign words processing and analysis with the help of our team members. Our proposed methods for discovering and updating foreign unknown word, first discovers the foreign word using morphological analysis with the help of automatically and manually created dictionaries, then suffix trimming and word segmentation, next our algorithm checks for its different written pattern using dictionaries according to its spelling and synonym word in native language (Korean) and also, updates the POS tags. We have tested on different collection of data from economics news, beauty & fashion and college student blogs, the results have shown great efficiency and improvement, and they were adequate enough to research further

    MKEM: a Multi-level Knowledge Emergence Model for mining undiscovered public knowledge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since Swanson proposed the Undiscovered Public Knowledge (UPK) model, there have been many approaches to uncover UPK by mining the biomedical literature. These earlier works, however, required substantial manual intervention to reduce the number of possible connections and are mainly applied to disease-effect relation. With the advancement in biomedical science, it has become imperative to extract and combine information from multiple disjoint researches, studies and articles to infer new hypotheses and expand knowledge.</p> <p>Methods</p> <p>We propose MKEM, a Multi-level Knowledge Emergence Model, to discover implicit relationships using Natural Language Processing techniques such as Link Grammar and Ontologies such as Unified Medical Language System (UMLS) MetaMap. The contribution of MKEM is as follows: First, we propose a flexible knowledge emergence model to extract implicit relationships across different levels such as molecular level for gene and protein and Phenomic level for disease and treatment. Second, we employ MetaMap for tagging biological concepts. Third, we provide an empirical and systematic approach to discover novel relationships.</p> <p>Results</p> <p>We applied our system on 5000 abstracts downloaded from PubMed database. We performed the performance evaluation as a gold standard is not yet available. Our system performed with a good precision and recall and we generated 24 hypotheses.</p> <p>Conclusions</p> <p>Our experiments show that MKEM is a powerful tool to discover hidden relationships residing in extracted entities that were represented by our Substance-Effect-Process-Disease-Body Part (SEPDB) model. </p

    Cooperation between expert knowledge and data mining discovered knowledge: Lessons learned

    Get PDF
    Expert systems are built from knowledge traditionally elicited from the human expert. It is precisely knowledge elicitation from the expert that is the bottleneck in expert system construction. On the other hand, a data mining system, which automatically extracts knowledge, needs expert guidance on the successive decisions to be made in each of the system phases. In this context, expert knowledge and data mining discovered knowledge can cooperate, maximizing their individual capabilities: data mining discovered knowledge can be used as a complementary source of knowledge for the expert system, whereas expert knowledge can be used to guide the data mining process. This article summarizes different examples of systems where there is cooperation between expert knowledge and data mining discovered knowledge and reports our experience of such cooperation gathered from a medical diagnosis project called Intelligent Interpretation of Isokinetics Data, which we developed. From that experience, a series of lessons were learned throughout project development. Some of these lessons are generally applicable and others pertain exclusively to certain project types
    • …
    corecore