2,974 research outputs found

    The linguistics of gender

    Get PDF
    This chapter explores grammatical gender as a linguistic phenomenon. First, I define gender in terms of agreement, and look at the parts of speech that can take gender agreement. Because it relates to assumptions underlying much psycholinguistic gender research, I also examine the reasons why gender systems are thought to emerge, change, and disappear. Then, I describe the gender system of Dutch. The frequent confusion about the number of genders in Dutch will be resolved by looking at the history of the system, and the role of pronominal reference therein. In addition, I report on three lexical- statistical analyses of the distribution of genders in the language. After having dealt with Dutch, I look at whether the genders of Dutch and other languages are more or less randomly assigned, or whether there is some system to it. In contrast to what many people think, regularities do indeed exist. Native speakers could in principle exploit such regularities to compute rather than memorize gender, at least in part. Although this should be taken into account as a possibility, I will also argue that it is by no means a necessary implication

    Technologies in computerized lexicography

    Get PDF
    Since the early eighties, computer technology has become increasingly relevant to lexicography. Computer science will probably not be the only technological discipline which may have implications for future computerized lexicography. Some developments in the fields of language technology, information technology and knowledge engineering, may support lexicographical practice and enhance the quality of the resulting dictionary. The present paper discusses how the analysis and interpretation of electronic corpus data by the lexicographer may be improved by automatic linguistic analysis, by better access to the corpus, and by a more flexible communication with the computer system. As a frame of reference, first an indication of the state of the art in computerized lexicography will be given, by a concise discussion of three projects at the Institute for Dutch Lexicology INL considered in an international context: the conversion of the Woordenboek der Nederlandsche Taal WNT (Dictionary of the Dutch Language Based on Historical Principles) to electronic form, the compilation of the Vroegmiddelnederlands Woordenboek (Dictionary of Early Middle Dutch) in a computerized lexicographer's workbench, and the INL Taalbank (INL Language Database). Although the topic of this paper is technology, focus is on functional rather than technical aspects of computerized lexicography.Keywords: computerized lexicography, electronic dictionary, electronic text corpus, lexicographer's workbench, integrated language database, automatic linguistic analysis, information retrieval, user interfac

    Legal documentation with the computer-aided indexing system CTX

    Get PDF
    Der Artikel befasst sich mit linguistischen Methoden in Information und Dokumentation, insb. zur Bearbeitung groĂźer Textsammlungen zum Zwecke des Information Retrieval (automatische Indexierung)

    KARL: A Knowledge-Assisted Retrieval Language

    Get PDF
    Data classification and storage are tasks typically performed by application specialists. In contrast, information users are primarily non-computer specialists who use information in their decision-making and other activities. Interaction efficiency between such users and the computer is often reduced by machine requirements and resulting user reluctance to use the system. This thesis examines the problems associated with information retrieval for non-computer specialist users, and proposes a method for communicating in restricted English that uses knowledge of the entities involved, relationships between entities, and basic English language syntax and semantics to translate the user requests into formal queries. The proposed method includes an intelligent dictionary, syntax and semantic verifiers, and a formal query generator. In addition, the proposed system has a learning capability that can improve portability and performance. With the increasing demand for efficient human-machine communication, the significance of this thesis becomes apparent. As human resources become more valuable, software systems that will assist in improving the human-machine interface will be needed and research addressing new solutions will be of utmost importance. This thesis presents an initial design and implementation as a foundation for further research and development into the emerging field of natural language database query systems

    Advancing CASE Productivity by Using Natural Language Processing and Computerized Ontologies: The ACAPULCO system

    Get PDF
    We present a new approach to software engineering which reduces the knowledge gap between user and development methodology by explicitly supporting concepts expressed in natural language. The tool uses a natural language description of a business process as input and transforms it into a process model. The system recognizes actors, objects, locations, relationships etc. referred to in the description and distinguishes different types of actions and conditions. The system uses multi-pass parsing and disambiguation NLP techniques and relies upon a custom-built dictionary of 23.000 English root words. The dictionary includes information about syntactic (e.g. noun, verb...) and semantic categories as well as word frequency. Currently 15 different semantic categories such as \u27tangible object\u27, \u27person\u27, \u27event\u27, etc. are distinguished. The ACAPULCO prototype, which runs on a standard PC under Windows 3.1 with 16 Mbytes of RAM, demonstrates a) that natural language processing for software engineeringis feasible, b) that this approach has potential of redefining the interaction and relationships between users, analysts and developers and c) that this approach is a powerful extension to traditional methods because it uses explicit knowledge about real-world business concepts

    Natural language software registry (second edition)

    Get PDF

    Diagnosing Reading strategies: Paraphrase Recognition

    Get PDF
    Paraphrase recognition is a form of natural language processing used in tutoring, question answering, and information retrieval systems. The context of the present work is an automated reading strategy trainer called iSTART (Interactive Strategy Trainer for Active Reading and Thinking). The ability to recognize the use of paraphrase—a complete, partial, or inaccurate paraphrase; with or without extra information—in the student\u27s input is essential if the trainer is to give appropriate feedback. I analyzed the most common patterns of paraphrase and developed a means of representing the semantic structure of sentences. Paraphrases are recognized by transforming sentences into this representation and comparing them. To construct a precise semantic representation, it is important to understand the meaning of prepositions. Adding preposition disambiguation to the original system improved its accuracy by 20%. The preposition sense disambiguation module itself achieves about 80% accuracy for the top 10 most frequently used prepositions. The main contributions of this work to the research community are the preposition classification and generalized preposition disambiguation processes, which are integrated into the paraphrase recognition system and are shown to be quite effective. The recognition model also forms a significant part of this contribution. The present effort includes the modeling of the paraphrase recognition process, featuring the Syntactic-Semantic Graph as a sentence representation, the implementation of a significant portion of this design demonstrating its effectiveness, the modeling of an effective preposition classification based on prepositional usage, the design of the generalized preposition disambiguation module, and the integration of the preposition disambiguation module into the paraphrase recognition system so as to gain significant improvement

    Benchmarking the performance of two automated term-extraction systems : LOGOS and ATAO

    Full text link
    MĂ©moire numĂ©risĂ© par la Direction des bibliothèques de l'UniversitĂ© de MontrĂ©al.Pour consulter le document d'accompagnement du mĂ©moire, veuillez contacter le Centre de conservation Lionel-Groulx de l'UniversitĂ© de MontrĂ©al ([email protected])
    • …
    corecore