7,125 research outputs found

    Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches

    Get PDF
    We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches. In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically signicant 10% relative decrease in error. The adapted parser is available under an open-source license at http://www.it.utu.fi/biolg

    An XML-based Tool for Tracking English Inclusions in German Text

    Get PDF
    The use of lexicons and corpora advances both linguistic research and performances of current natural language processing (NLP) systems. We present a tool that exploits such resources, specifically English and German lexical databases and the World Wide Web to recognise English inclusions in German newspaper articles. The output of the tool can assist lexical resource developers in monitoring changing patterns of English inclusion usage. The corpus used for the classification covers three different domains. We report the classification results and illustrate their value to linguistic and NLP research

    A Web Smart Space Framework for Intelligent Search Engines

    Get PDF
    A web smart space is an intelligent environment which has additional capability of searching the information smartly and efficiently. New advancements like dynamic web contents generation has increased the size of web repositories. Among so many modern software analysis requirements, one is to search information from the given repository. But useful information extraction is a troublesome hitch due to the multi-lingual; base of the web data collection. The issue of semantic based information searching has become a standoff due to the inconsistencies and variations in the characteristics of the data. In the accomplished research, a web smart space framework has been proposed which introduces front end processing for a search engine to make the information retrieval process more intelligent and accurate. In orthodox searching anatomies, searching is performed only by using pattern matching technique and consequently a large number of irrelevant results are generated. The projected framework has insightful ability to improve this drawback and returns efficient outcomes. Designed framework gets text input from the user in the form complete question, understands the input and generates the meanings. Search engine searches on the basis of the information provided

    Using NLP tools in the specification phase

    Get PDF
    The software quality control is one of the main topics in the Software Engineering area. To put the effort in the quality control during the specification phase leads us to detect possible mistakes in an early steps and, easily, to correct them before the design and implementation steps start. In this framework the goal of SAREL system, a knowledge-based system, is twofold. On one hand, to help software engineers in the creation of quality Software Requirements Specifications. On the other hand, to analyze the correspondence between two different conceptual representations associated with two different Software Requirements Specification documents. For the first goal, a set of NLP and Knowledge management tools is applied to obtain a conceptual representation that can be validated and managed by the software engineer. For the second goal we have established some correspondence measures in order to get a comparison between two conceptual representations. This information will be useful during the interaction.Postprint (published version
    corecore