294 research outputs found

    A Corpus-Based Approach for Building Semantic Lexicons

    Get PDF
    Semantic knowledge can be a great asset to natural language processing systems, but it is usually hand-coded for each application. Although some semantic information is available in general-purpose knowledge bases such as WordNet and Cyc, many applications require domain-specific lexicons that represent words and categories for a particular topic. In this paper, we present a corpus-based method that can be used to build semantic lexicons for specific categories. The input to the system is a small set of seed words for a category and a representative text corpus. The output is a ranked list of words that are associated with the category. A user then reviews the top-ranked words and decides which ones should be entered in the semantic lexicon. In experiments with five categories, users typically found about 60 words per category in 10-15 minutes to build a core semantic lexicon.Comment: 8 pages - to appear in Proceedings of EMNLP-

    Experimental clean combustor program: Noise study

    Get PDF
    Under a Noise Addendum to the NASA Experimental Clean Combustor Program (ECCP) internal pressure fluctuations were measured during tests of JT9D combustor designs conducted in a burner test rig. Measurements were correlated with burner operating parameters using an expression relating farfield noise to these parameters. For a given combustor, variation of internal noise with operating parameters was reasonably well predicted by this expression but the levels were higher than farfield predictions and differed significantly among several combustors. For two burners, discharge stream temperature fluctuations were obtained with fast-response thermocouples to allow calculation of indirect combustion noise which would be generated by passage of the temperature inhomogeneities through the high pressure turbine stages of a JT9D turbofan engine. Using a previously developed analysis, the computed indirect combustion noise was significantly lower than total low frequency core noise observed on this and several other engines

    Two-stage, low noise advanced technology fan. 5: Acoustic final report

    Get PDF
    The NASA Q2S(quiet two-stage) fan is a 0.836m (32.9 in.) diameter model of the STF 433 engine fan, selected in a 1972 study for an Advanced Technology Transport (ATT) airplane. Noise-control features include: low tip speed, moderate stage pressure rise, large blade-vane spacings, no inlet guide vanes, and optimum blade and vane numbers. Tests were run on the baseline Q2S fan with standard inlet and discharge ducts. Further tests were made of a translating centerbody sonic inlet device and treated discharge ducts. Results were scaled to JT8D and JT3D engine fan size for comparison with current two-stage fans, and were also scaled to STF 433 fan size to compare calculated ATT flyover noise with FAR 36 limits. Baseline Q2S results scaled to JT8D and JT3D engine fan sizes showed substantial noise reductions. Calculated unsuppressed baseline ATT flyovers averaged about 2.5 EPNdB below FAR 36 limits. Using measured sonic inlet results, scaled baseline Q2S fan results, and calculated attenuations for a 1975 technology duct liner, projected flyover noise calculations for the ATT averaged about FAR 36 limits minus 10 EPNdB. Advances in suppression technology required to meet the 1985 goal of FAR 36 limits minus 20 EPNdB are discussed

    Empirical study of automated dictionary construction for information extraction in three domains

    Get PDF
    ManuscriptA primary goal of natural language processing researchers is to develop a knowledge-based natural language processing (NLP) system that is portable across domains. However, most knowledge-based NLP systems rely on a domain-specific dictionary of concepts, which represents a substantial knowledge-engineering bottleneck. We have developed a system called AutoSlog that addresses the knowledge-engineering bottleneck for a task called information extraction. AutoSlog automatically creates domain-specific dictionaries for information extraction, given an appropriate training corpus. We have used AutoSlog to create a dictionary of extraction patterns for terrorism, which achieved 98% of the performance of a handcrafted dictionary that required approximately 1500 person-hours to build. In this paper, we describe experiments with AutoSlog in two additional domains: joint ventures and microelectronics. We compare the performance of AutoSlog across the three domains, discuss the lessons learned about the generality of this approach, and present results from two experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog

    Learning subjective nouns using extraction pattern bootstrapping

    Get PDF
    Journal ArticleWe explore the idea of creating a subjectivity classifier that uses lists of subjective nouns learned by bootstrapping algorithms. The goal of our research is to develop a system that can distinguish subjective sentences from objective sentences. First, we use two bootstrapping algorithms that exploit extraction patterns to learn sets of subjective nouns. Then we train a Naive Bayes classifier using the subjective nouns, discourse features, and subjectivity clues identified in prior research. The bootstrapping algorithms learned over 1000 subjective nouns, and the subjectivity classifier performed well, achieving 77% recall with 81% precision

    Corpus-based approach for building semantic lexicons

    Get PDF
    Journal ArticleSemantic knowledge can be a great asset to natural language processing systems, but it is usually hand-coded for each application. Although some semantic information is available in general-purpose knowledge bases such as Word Net and Cyc, many applications require domain-specific lexicons that represent words and categories for a particular topic. In this paper, we present a corpus-based method that can be used t o build semantic lexicons for specific categories. The input t o the system is a small set of seed words for a category and a representative text corpus. The output is a ranked list of words that are associated with the category. A user then reviews the top-ranked words and decides which ones should be entered in the semantic lexicon. Tn experiments with five categories, users typically found about 60 words per category in 10-15 minutes to build a core semantic lexicon

    Learning and evaluating the content and structure of a term taxonomy

    Get PDF
    Journal ArticleIn this paper, we describe a weakly supervised bootstrapping algorithm that reads Web texts and learns taxonomy terms. The bootstrapping algorithm starts with two seed words (a seed hypernym (Root concept) and a seed hyponym) that are inserted into a doubly anchored hyponym pattern. In alternating rounds, the algorithm learns new hyponym terms and new hypernym terms that are subordinate to the Root concept. We conducted an extensive evaluation with human annotators to evaluate the learned hyponym and hypernym terms for two categories: animals and people
    corecore