795 research outputs found

    Parsing for prosody: What a text-to-speech system needs from syntax

    Get PDF
    The authors describe an experimental text-to-speech system that uses a syntactic parser and prosody rules to determine prosodic phrasing for synthesized speech. It is shown that many aspects of sentence analysis that are required for other parsing applications, e.g., machine translation and question answering, become unnecessary in parsing for text-to-speech. It is possible to generate natural-sounding prosodic phrasing by relying on information about syntactic category type, partial constituency, and length; information about clausal and verb phrase constituency, predicate-argument relations, and prepositional phrase attachment can be bypassed

    Neural Techniques for German Dependency Parsing

    Get PDF
    Syntactic parsing is the task of analyzing the structure of a sentence based on some predefined formal assumption. It is a key component in many natural language processing (NLP) pipelines and is of great benefit for natural language understanding (NLU) tasks such as information retrieval or sentiment analysis. Despite achieving very high results with neural network techniques, most syntactic parsing research pays attention to only a few prominent languages (such as English or Chinese) or language-agnostic settings. Thus, we still lack studies that focus on just one language and design specific parsing strategies for that language with regards to its linguistic properties. In this thesis, we take German as the language of interest and develop more accurate methods for German dependency parsing by combining state-of-the-art neural network methods with techniques that address the specific challenges posed by the language-specific properties of German. Compared to English, German has richer morphology, semi-free word order, and case syncretism. It is the combination of those characteristics that makes parsing German an interesting and challenging task. Because syntactic parsing is a task that requires many levels of language understanding, we propose to study and improve the knowledge of parsing models at each level in order to improve syntactic parsing for German. These levels are: (sub)word level, syntactic level, semantic level, and sentence level. At the (sub)word level, we look into a surge in out-of-vocabulary words in German data caused by compounding. We propose a new type of embeddings for compounds that is a compositional model of the embeddings of individual components. Our experiments show that character-based embeddings are superior to word and compound embeddings in dependency parsing, and compound embeddings only outperform word embeddings when the part-of-speech (POS) information is unavailable. Thus, we conclude that it is the morpho-syntactic information of unknown compounds, not the semantic one, that is crucial for parsing German. At the syntax level, we investigate challenges for local grammatical function labeler that are caused by case syncretism. In detail, we augment the grammatical function labeling component in a neural dependency parser that labels each head-dependent pair independently with a new labeler that includes a decision history, using Long Short-Term Memory networks (LSTMs). All our proposed models significantly outperformed the baseline on three languages: English, German and Czech. However, the impact of the new models is not the same for all languages: the improvement for English is smaller than for the non-configurational languages (German and Czech). Our analysis suggests that the success of the history-based models is not due to better handling of long dependencies but that they are better in dealing with the uncertainty in head direction. We study the interaction of syntactic parsing with the semantic level via the problem of PP attachment disambiguation. Our motivation is to provide a realistic evaluation of the task where gold information is not available and compare the results of disambiguation systems against the output of a strong neural parser. To our best knowledge, this is the first time that PP attachment disambiguation is evaluated and compared against neural dependency parsing on predicted information. In addition, we present a novel approach for PP attachment disambiguation that uses biaffine attention and utilizes pre-trained contextualized word embeddings as semantic knowledge. Our end-to-end system outperformed the previous pipeline approach on German by a large margin simply by avoiding error propagation caused by predicted information. In the end, we show that parsing systems (with the same semantic knowledge) are in general superior to systems specialized for PP attachment disambiguation. Lastly, we improve dependency parsing at the sentence level using reranking techniques. So far, previous work on neural reranking has been evaluated on English and Chinese only, both languages with a configurational word order and poor morphology. We re-assess the potential of successful neural reranking models from the literature on English and on two morphologically rich(er) languages, German and Czech. In addition, we introduce a new variation of a discriminative reranker based on graph convolutional networks (GCNs). Our proposed reranker not only outperforms previous models on English but is the only model that is able to improve results over the baselines on German and Czech. Our analysis points out that the failure is due to the lower quality of the k-best lists, where the gold tree ratio and the diversity of the list play an important role

    Research in the Language, Information and Computation Laboratory of the University of Pennsylvania

    Get PDF
    This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue it’s easier than ever to do so: this document is accessible on the “information superhighway”. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authors’ abstracts in the web version of this report. The abstracts describe the researchers’ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn

    Making AI Meaningful Again

    Get PDF
    Artificial intelligence (AI) research enjoyed an initial period of enthusiasm in the 1970s and 80s. But this enthusiasm was tempered by a long interlude of frustration when genuinely useful AI applications failed to be forthcoming. Today, we are experiencing once again a period of enthusiasm, fired above all by the successes of the technology of deep neural networks or deep machine learning. In this paper we draw attention to what we take to be serious problems underlying current views of artificial intelligence encouraged by these successes, especially in the domain of language processing. We then show an alternative approach to language-centric AI, in which we identify a role for philosophy

    The VERBMOBIL domain model version 1.0

    Get PDF
    This report describes the domain model used in the German Machine Translation project VERBMOBIL. In order make the design principles underlying the modeling explicit, we begin with a brief sketch of the VERBMOBIL demonstrator architecture from the perspective of the domain model. We then present some rather general considerations on the nature of domain modeling and its relationship to semantics. We claim that the semantic information contained in the model mainly serves two tasks. For one thing, it provides the basis for a conceptual transfer from German to English; on the other hand, it provides information needed for disambiguation. We argue that these tasks pose different requirements, and that domain modeling in general is highly task-dependent. A brief overview of domain models or ontologies used in existing NLP systems confirms this position. We finally describe the different parts of the domain model, explain our design decisions, and present examples of how the information contained in the model can be actually used in the VERBMOBIL demonstrator. In doing so, we also point out the main functionality of FLEX, the Description Logic system used for the modeling

    Designing Service-Oriented Chatbot Systems Using a Construction Grammar-Driven Natural Language Generation System

    Get PDF
    Service oriented chatbot systems are used to inform users in a conversational manner about a particular service or product on a website. Our research shows that current systems are time consuming to build and not very accurate or satisfying to users. We find that natural language understanding and natural language generation methods are central to creating an e�fficient and useful system. In this thesis we investigate current and past methods in this research area and place particular emphasis on Construction Grammar and its computational implementation. Our research shows that users have strong emotive reactions to how these systems behave, so we also investigate the human computer interaction component. We present three systems (KIA, John and KIA2), and carry out extensive user tests on all of them, as well as comparative tests. KIA is built using existing methods, John is built with the user in mind and KIA2 is built using the construction grammar method. We found that the construction grammar approach performs well in service oriented chatbots systems, and that users preferred it over other systems

    Language processing in bilingual children and adults: evidence from filler-gap dependencies and garden path sentences

    Get PDF
    The present thesis examines morphosyntactic processing in bilingual children and adults. It bridges gaps in the existing literature in three ways. Firstly, previous work on bilingual children has focused on inflectional morphology but has not examined the timecourse of processing in terms of misinterpretation and real time reanalysis or use of information of different sources to aid disambiguation. Second, it extends the use of the visual world eye-tracking paradigm to research in morphosyntactic processing in bilingual children. Third, for the adults, it compares early/native bilinguals to monolinguals and late bilinguals / L2 learners. Two linguistic phenomena were investigated with adults and children; which-questions and garden-path sentences in English. Overall, both bilingual children and adults showed qualitatively similar patterns of processing to their monolingual counterparts. All groups experienced greater difficulty with structures where there was ambiguity and a need for syntactic reanalysis, i.e. object which-questions and garden-path sentences, suggesting incremental processing. The main difference between monolinguals and bilinguals is that of speed; bilinguals appeared to process sentences slower than monolinguals even when their comprehension accuracy was equally as high. This difference was found for both children and adults, as evidenced by the reaction times or changes in the gaze data, and was generally not more pronounced in sentences where reanalysis is required. With regards to the bilingual adults, the early/native bilinguals clustered with the L2 learners in terms of processing speed but more so with the monolingual adults in terms of accuracy. The bilingual groups showed a reduced utilisation of information from various sources to aid processing. Bilingual adults and children made use of number mismatch between the two noun phrases in the study on which-questions to facilitate disambiguation; however, they showed this effect just for off-line comprehension accuracy and not for real time processing, i.e. in the gaze data. The bilinguals also did not show consistent use of referential context to disambiguate in the study on garden path sentences, although this was also the case for the monolingual adults and children. In sum, the results from the studies in this thesis suggest the both bilingual adults and children were equally as able as their monolingual counterparts at an end stage but differed to the monolinguals on more fine-grained measures of real time processing. These measures point to qualitatively similar but more protracted over time processing for bilinguals and with more limited use of facilitatory information to disambiguate
    • …
    corecore