6 research outputs found

    Inducing a Semantically Annotated Lexicon via EM-Based Clustering

    Full text link
    We present a technique for automatic induction of slot annotations for subcategorization frames, based on induction of hidden classes in the EM framework of statistical estimation. The models are empirically evalutated by a general decision test. Induction of slot labeling for subcategorization frames is accomplished by a further application of EM, and applied experimentally on frame observations derived from parsing large corpora. We outline an interpretation of the learned representations as theoretical-linguistic decompositional lexical entries.Comment: 8 pages, uses colacl.sty. Proceedings of the 37th Annual Meeting of the ACL, 199

    Acquiring Word-Meaning Mappings for Natural Language Interfaces

    Full text link
    This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of phrases paired with meaning representations. WOLFIE is part of an integrated system that learns to transform sentences into representations such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The usefulness of the lexicons learned by WOLFIE are compared to those acquired by a similar system, with results favorable to WOLFIE. A second set of experiments demonstrates WOLFIE's ability to scale to larger and more difficult, albeit artificially generated, corpora. In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, most results to date for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to semantic lexicons. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance

    Knowledge-based methods for automatic extraction of domain-specific ontologies

    Get PDF
    Semantic web technology aims at developing methodologies for representing large amount of knowledge in web accessible form. The semantics of knowledge should be easy to interpret and understand by computer programs, so that sharing and utilizing knowledge across the Web would be possible. Domain specific ontologies form the basis for knowledge representation in the semantic web. Research on automated development of ontologies from texts has become increasingly important because manual construction of ontologies is labor intensive and costly, and, at the same time, large amount of texts for individual domains is already available in electronic form. However, automatic extraction of domain specific ontologies is challenging due to the unstructured nature of texts and inherent semantic ambiguities in natural language. Moreover, the large size of texts to be processed renders full-fledged natural language processing methods infeasible. In this dissertation, we develop a set of knowledge-based techniques for automatic extraction of ontological components (concepts, taxonomic and non-taxonomic relations) from domain texts. The proposed methods combine information retrieval metrics, lexical knowledge-base(like WordNet), machine learning techniques, heuristics, and statistical approaches to meet the challenge of the task. These methods are domain-independent and automatic approaches. For extraction of concepts, the proposed WNSCA+{PE, POP} method utilizes the lexical knowledge base WordNet to improve precision and recall over the traditional information retrieval metrics. A WordNet-based approach, the compound term heuristic, and a supervised learning approach are developed for taxonomy extraction. We also developed a weighted word-sense disambiguation method for use with the WordNet-based approach. An unsupervised approach using log-likelihood ratios is proposed for extracting non-taxonomic relations. Further more, a supervised approach is investigated to learn the semantic constraints for identifying relations from prepositional phrases. The proposed methods are validated by experiments with the Electronic Voting and the Tender Offers, Mergers, and Acquisitions domain corpus. Experimental results and comparisons with some existing approaches clearly indicate the superiority of our methods. In summary, a good combination of information retrieval, lexical knowledge base, statistics and machine learning methods in this study has led to the techniques efficient and effective for extracting ontological components automatically

    A natural language based indexing technique for Chinese information retrieval.

    Get PDF
    Pang Chun Kiu.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 101-107).Chapter 1 --- Introduction --- p.2Chapter 1.1 --- Chinese Indexing using Noun Phrases --- p.6Chapter 1.2 --- Objectives --- p.8Chapter 1.3 --- An Overview of the Thesis --- p.8Chapter 2 --- Background --- p.10Chapter 2.1 --- Technology Influences on Information Retrieval --- p.10Chapter 2.2 --- Related Work --- p.13Chapter 2.2.1 --- Statistical/Keyword Approaches --- p.13Chapter 2.2.2 --- Syntactical approaches --- p.15Chapter 2.2.3 --- Semantic approaches --- p.17Chapter 2.2.4 --- Noun Phrases Approach --- p.18Chapter 2.2.5 --- Chinese Information Retrieval --- p.20Chapter 2.3 --- Our Approach --- p.21Chapter 3 --- Chinese Noun Phrases --- p.23Chapter 3.1 --- Different types of Chinese Noun Phrases --- p.23Chapter 3.2 --- Ambiguous noun phrases --- p.27Chapter 3.2.1 --- Ambiguous English Noun Phrases --- p.27Chapter 3.2.2 --- Ambiguous Chinese Noun Phrases --- p.28Chapter 3.2.3 --- Statistical data on the three NPs --- p.33Chapter 4 --- Index Extraction from De-de Conj. NP --- p.35Chapter 4.1 --- Word Segmentation --- p.36Chapter 4.2 --- Part-of-speech tagging --- p.37Chapter 4.3 --- Noun Phrase Extraction --- p.37Chapter 4.4 --- The Chinese noun phrase partial parser --- p.38Chapter 4.5 --- Handling Parsing Ambiguity --- p.40Chapter 4.6 --- Index Building Strategy --- p.41Chapter 4.7 --- The cross-set generation rules --- p.44Chapter 4.8 --- Example 1: Indexing De-de NP --- p.46Chapter 4.9 --- Example 2: Indexing Conjunctive NP --- p.48Chapter 4.10 --- Experimental results and Discussion --- p.49Chapter 5 --- Indexing Compound Nouns --- p.52Chapter 5.1 --- Previous Researches on Compound Nouns --- p.53Chapter 5.2 --- Indexing two-term Compound Nouns --- p.55Chapter 5.2.1 --- About the thesaurus《同義詞詞林》 --- p.56Chapter 5.3 --- Indexing Compound Nouns of three or more terms --- p.58Chapter 5.4 --- Corpus learning approach --- p.59Chapter 5.4.1 --- An Example --- p.60Chapter 5.4.2 --- Experimental Setup --- p.63Chapter 5.4.3 --- An Experiment using the third level of the Cilin --- p.65Chapter 5.4.4 --- An Experiment using the second level of the Cilin --- p.66Chapter 5.5 --- Contextual Approach --- p.68Chapter 5.5.1 --- The algorithm --- p.69Chapter 5.5.2 --- An Illustrative Example --- p.71Chapter 5.5.3 --- Experiments on compound nouns --- p.72Chapter 5.5.4 --- Experiment I: Word Distance Based Extraction --- p.73Chapter 5.5.5 --- Experiment II: Semantic Class Based Extraction --- p.75Chapter 5.5.6 --- Experiments III: On different boundaries --- p.76Chapter 5.5.7 --- The Final Algorithm --- p.79Chapter 5.5.8 --- Experiments on other compounds --- p.82Chapter 5.5.9 --- Discussion --- p.83Chapter 6 --- Overall Effectiveness --- p.85Chapter 6.1 --- Illustrative Example for the Integrated Algorithm --- p.86Chapter 6.2 --- Experimental Setup --- p.90Chapter 6.3 --- Experimental Results & Discussion --- p.91Chapter 7 --- Conclusion --- p.95Chapter 7.1 --- Summary --- p.95Chapter 7.2 --- Contributions --- p.97Chapter 7.3 --- Future Directions --- p.98Chapter 7.3.1 --- Word-sense determination --- p.98Chapter 7.3.2 --- Hybrid approach for compound noun indexing --- p.99Chapter A --- Cross-set Generation Rules --- p.108Chapter B --- Tag set by Tsinghua University --- p.110Chapter C --- Noun Phrases Test Set --- p.113Chapter D --- Compound Nouns Test Set --- p.124Chapter D.l --- Three-term Compound Nouns --- p.125Chapter D.1.1 --- NVN --- p.125Chapter D.1.2 --- Other three-term compound nouns --- p.129Chapter D.2 --- Four-term Compound Nouns --- p.133Chapter D.3 --- Five-term and six-term Compound Nouns --- p.13
    corecore