2,595 research outputs found

    Punctuation in Quoted Speech

    Full text link
    Quoted speech is often set off by punctuation marks, in particular quotation marks. Thus, it might seem that the quotation marks would be extremely useful in identifying these structures in texts. Unfortunately, the situation is not quite so clear. In this work, I will argue that quotation marks are not adequate for either identifying or constraining the syntax of quoted speech. More useful information comes from the presence of a quoting verb, which is either a verb of saying or a punctual verb, and the presence of other punctuation marks, usually commas. Using a lexicalized grammar, we can license most quoting clauses as text adjuncts. A distinction will be made not between direct and indirect quoted speech, but rather between adjunct and non-adjunct quoting clauses.Comment: 11 pages, 11 ps figures, Proceedings of SIGPARSE 96 - Punctuation in Computational Linguistic

    Information structure in linguistic theory and in speech production : validation of a cross-linguistic data set

    Get PDF
    The aim of this paper is to validate a dataset collected by means of production experiments which are part of the Questionnaire on Information Structure. The experiments generate a range of information structure contexts that have been observed in the literature to induce specific constructions. This paper compares the speech production results from a subset of these experiments with specific claims about the reflexes of information structure in four different languages. The results allow us to evaluate and in most cases validate the efficacy of our elicitation paradigms, to identify potentially fruitful avenues of future research, and to highlight issues involved in interpreting speech production data of this kind

    ANNIS: a linguistic database for exploring information structure

    Get PDF
    In this paper, we discuss the design and implementation of our first version of the database "ANNIS" (ANNotation of Information Structure). For research based on empirical data, ANNIS provides a uniform environment for storing this data together with its linguistic annotations. A central database promotes standardized annotation, which facilitates interpretation and comparison of the data. ANNIS is used through a standard web browser and offers tier-based visualization of data and annotations, as well as search facilities that allow for cross-level and cross-sentential queries. The paper motivates the design of the system, characterizes its user interface, and provides an initial technical evaluation of ANNIS with respect to data size and query processing

    Preliminary Work on Speech Unit Selection Using Syntax Phonology Interface

    Get PDF
    This paper proposes an approach which uses a syntax-phonology interface to select the most appropriate speech units for a target sentence. The selection of the speech units is done by constructing the syntax-phonology tree structure of the target sentence. The construction of the syntax-phonology tree is adapted from the example-based parsing of UTMK machine translation

    Multi-modal meaning – An empirically-founded process algebra approach

    Get PDF
    Humans communicate with different modalities. We offer an account of multi-modal meaning coordination, taking speech-gesture meaning coordination as a prototypical case. We argue that temporal synchrony (plus prosody) does not determine how to coordinate speech meaning and gesture meaning. Challenging cases are asynchrony and broadcasting cases, which are illustrated with empirical data. We propose that a process algebra account satisfies the desiderata. It models gesture and speech as independent but concurrent processes that can communicate flexibly with each other and exchange the same information more than once. The account utilizes the psi-calculus, allowing for agents, input-output-channels, concurrent processes, and data transport of typed lambda-terms. A multi-modal meaning is produced integrating speech meaning and gesture meaning into one semantic package. Two cases of meaning coordination are handled in some detail: the asynchrony between gesture and speech, and the broadcasting of gesture meaning across several dialogue contributions. This account can be generalized to other cases of multi-modal meaning

    Natural Sounding Standard Malay Speech Synthesis Based On UTMK EBMT Architecture System

    Get PDF
    In this research work, we make natural sounding speech synthesis as the main goal. This goal was chosen following the type of demanded speech synthesis application systems by the industrial market; the limited-domain speech synthesis application systems. The limited-domain speech synthesis application system has restricted number of vocabularies (less flexible) but requires a highly natural sounding of speech synthesis. Based on the evolution of speech synthesis technique, one can conclude that using natural speech units without applying any signal processing is the technique to produce the most natural sounding of synthetic speech

    Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

    Full text link
    Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech (TTS) systems. However, previous approaches require substantial annotated training data and additional efforts from language experts, making it difficult to extend high-quality neural TTS systems to out-of-domain daily conversations and countless languages worldwide. This paper tackles the polyphone disambiguation problem from a concise and novel perspective: we propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary (the existing prior information in the natural language). Specifically, we design a semantics-to-pronunciation attention (S2PA) module to match the semantic patterns between the input text sequence and the prior semantics in the dictionary and obtain the corresponding pronunciations; The S2PA module can be easily trained with the end-to-end TTS model without any annotated phoneme labels. Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy and improves the prosody modeling of TTS systems. Further extensive analyses demonstrate that each design in Dict-TTS is effective. The code is available at \url{https://github.com/Zain-Jiang/Dict-TTS}.Comment: Accepted by NeurIPS 202

    An XML Coding Scheme for Multimodal Corpus Annotation

    No full text
    International audienceMultimodality has become one of today's most crucial challenges both for linguistics and computer science, entailing theoretical issues as well as practical ones (verbal interaction description, human-machine dialogues, virtual reality etc...). Understanding interaction processes is one of the main targets of these sciences, and requires to take into account the whole set of modalities and the way they interact.From a linguistic standpoint, language and speech analysis are based on studies of distinct research fields, such as phonetics, phonemics, syntax, semantics, pragmatics or gesture studies. Each of them have been investigated in the past either separately or in relation with another field that was considered as closely connected (e.g. syntax and semantics, prosody and syntax, etc.). The perspective adopted by modern linguistics is a considerably broader one: even though each domain reveals a certain degree of autonomy, it cannot be accounted for independently from its interactions with the other domains. Accordingly, the study of the interaction between the fields appears to be as important as the study of each distinct field. This is a pre-requisite for an elaboration of a valid theory of language. However, as important as the needs in this area might be, high level multimodal resources and adequate methods in order to construct them are scarce and unequally developed. Ongoing projects mainly focus on one modality as a main target, with an alternate modality as an optional complement. Moreover, coding standards in this field remain very partial and do not cover all the needs in terms of multimodal annotation. One of the first issues we have to face is the definition of a coding scheme providing adequate responses to the needs of the various levels encompassed, from phonetics to pragmatics or syntax. While working in the general context of international coding standards, we plan to create a specific coding standard designed to supply proper responses to the specific needs of multimodal annotation, as available solutions in the area do not seem to be totally satisfactory. <BR /
    corecore