42 research outputs found

    The Verbmobil semantic database

    Get PDF
    The distributed development of the modules of a large natural language processing system at different sites makes interface definitions a vital issue. It becomes even more urgent when several modules with the same intended functionality are developed in parallel and should be indistinguishable with respect to their input—output—behaviour. Another important issue is the acquisition and maintenance of lexical information which should be stored independently of an application to make it (re)usable for different purposes. This paper describes the design and use of the Verbmobil Semantic Database which we developed in order to deal with these issues in the area of lexical semantics in Verbmobil

    Workshop Proceedings of the 12th edition of the KONVENS conference

    Get PDF
    The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years

    Auditory gaydar: perception of sexual orientation based on female voice

    Get PDF
    We investigated auditory gaydar (i.e., the ability to recognize sexual orientation) in female speakers, addressing three related issues: whether auditory gaydar is (1) accurate, (2) language-dependent (i.e., occurs only in some languages, but not in others), and (3) ingroup-specific (i.e., occurs only when listeners judge speakers of their own language, but not when they judge foreign language speakers). In three experiments, we asked Italian, Portuguese, and German participants (total N = 466) to listen to voices of Italian, Portuguese, and German women, and to rate their sexual orientation. Our results showed that auditory gaydar was not accurate; listeners were not able to identify speakers’ sexual orientation correctly. The same pattern emerged consistently across all three languages and when listeners rated foreign-language speakers.info:eu-repo/semantics/acceptedVersio

    Vorhersage und Wahrnehmung deutscher Betonungsmuster

    Get PDF
    Motivation für die Arbeit war die Beobachtung, daß sich phonologische Theorien zur Vorhersage von Betonungsmustern meist auf zweifelhafte, introspektiv gewonnene Daten verlassen und lediglich anhand kleiner Sprachfragmente überprüft werden. Um diesem Defizit phonologischer Modell- und Theoriebildungen entgegenzutreten, wurde eine Evaluationsmethode entwickelt und angewendet, die eine formale Darstellung und Implementierbarkeit/Implementierung berücksichtigt und die automatischen Vorhersagen anhand größerer, objektiv etikettierter Datensätze überprüft. Die zentralen Erkenntnisse der Arbeit sind die folgenden: - Syntaktische Phrasierung spielt nur eine geringe Rolle für die Betonungszuweisung auf Äußerungsebene. - Mit Hilfe der Wortklasseninformation, die über eine einfache Differenzierung in Inhalts- und Funktionswörter hinausgeht, läßt sich die Prominenz auf Äußerungsebene gut vorhersagen. - Die Auftretenshäufigkeit eines Wortes in der deutschen Sprache steht in keinem direkten Zusammenhang zu seiner Betonungsstärke. - Deakzentuierung spielt auf der Äußerungsebene, aber auch bei wortintern provozierten Akzentzusammenstößen nur eine marginale Rolle im Deutschen. - Lange Folgen unbetonter Silben, die insbesondere bei langen Wörtern auftauchen, werden im Deutschen vermieden. - Das Silbengewicht spielt im Deutschen eine herausragende Rolle bei der Plazierung der Wortbetonung: Sofern die letzte Silbe signifikant schwerer ist als die vorletzte, fällt die Wortbetonung an den rechten Wortrand. Die Silbengewichtshierarchie muß allerdings erweitert werden, um alle Wortbetonungsphänomene erklären zu können. - Ist die finale Silbe leicht, so fällt die Betonung i.d.R. auf die dem rechten Wortrand am nächsten stehende betonbare Silbe. - Bei der Betonung von Eigennamen im Deutschen ist der Einfluß des Silbengewichts wesentlich weniger stark als bei der Betonung von Nicht-Eigennamen. Daher mußte hierfür ein separates Regelwerk entwickelt werden. Alle Erkenntnisse wurden formal notiert, so daß ihrer Integration in Grammatikformalismen sowie sprachtechnologischen Anwendungen nichts im Wege steht. Abschließend wurde gezeigt, daß die aus den Evaluationen gewonnenen Erkenntnisse sich auch in den nicht-generativ geprägten Formalismus der Optimalitätstheorie einbinden lassen.Motivation for this thesis was the insight that phonological theories tend to be built upon questionable, often intuitively gained data. Besides, their predictive power is often tested on fragments of the language in question. To overcome this deficit of phonological theory-building, an evaluation method was developed and applied that relies on a formal representation and implementation of the rules and furthermore tests its predictions on large, objectively gathered data sets. The central insights of the thesis are the following ones: - syntactic phrasing only plays a minor role in German stress assignment on utterance level - a fine-grained word class analysis helps to predict prominence on utterance level - Frequency of occurrence of a specific word is no direct indicator of prominence in German - Deaccentuation and stress shift, even in word-internal stress clash environments, only plays a marginal role in German - Long sequences of unstressed syllables are prevented - Syllable weight plays a major role in word-level stress placement: if the final syllable is significantly heavier than the penultimate one, stress falls onto the last syllable. Syllable weight hierarchy needed to be extended in order to explain all phenomena. - If the final syllable of a word is light, stress usually falls on the stressable syllable closed the the right edge of the word. - Syllable weight influence is less strong in stress assignment to proper names in German. All results were formalised in order to enable their integration into speech technological applications and frameworks of computational linguistics. Finally, the results were integrated in a formalism based on optimality theoretic assumptions

    Fast Speech in Unit Selection Speech Synthesis

    Get PDF
    Moers-Prinz D. Fast Speech in Unit Selection Speech Synthesis. Bielefeld: Universität Bielefeld; 2020.Speech synthesis is part of the everyday life of many people with severe visual disabilities. For those who are reliant on assistive speech technology the possibility to choose a fast speaking rate is reported to be essential. But also expressive speech synthesis and other spoken language interfaces may require an integration of fast speech. Architectures like formant or diphone synthesis are able to produce synthetic speech at fast speech rates, but the generated speech does not sound very natural. Unit selection synthesis systems, however, are capable of delivering more natural output. Nevertheless, fast speech has not been adequately implemented into such systems to date. Thus, the goal of the work presented here was to determine an optimal strategy for modeling fast speech in unit selection speech synthesis to provide potential users with a more natural sounding alternative for fast speech output

    Part-of-speech Tagging: A Machine Learning Approach based on Decision Trees

    Get PDF
    The study and application of general Machine Learning (ML) algorithms to theclassical ambiguity problems in the area of Natural Language Processing (NLP) isa currently very active area of research. This trend is sometimes called NaturalLanguage Learning. Within this framework, the present work explores the applicationof a concrete machine-learning technique, namely decision-tree induction, toa very basic NLP problem, namely part-of-speech disambiguation (POS tagging).Its main contributions fall in the NLP field, while topics appearing are addressedfrom the artificial intelligence perspective, rather from a linguistic point of view.A relevant property of the system we propose is the clear separation betweenthe acquisition of the language model and its application within a concrete disambiguationalgorithm, with the aim of constructing two components which are asindependent as possible. Such an approach has many advantages. For instance, thelanguage models obtained can be easily adapted into previously existing taggingformalisms; the two modules can be improved and extended separately; etc.As a first step, we have experimentally proven that decision trees (DT) providea flexible (by allowing a rich feature representation), efficient and compact wayfor acquiring, representing and accessing the information about POS ambiguities.In addition to that, DTs provide proper estimations of conditional probabilities fortags and words in their particular contexts. Additional machine learning techniques,based on the combination of classifiers, have been applied to address some particularweaknesses of our tree-based approach, and to further improve the accuracy in themost difficult cases.As a second step, the acquired models have been used to construct simple,accurate and effective taggers, based on diiferent paradigms. In particular, wepresent three different taggers that include the tree-based models: RTT, STT, andRELAX, which have shown different properties regarding speed, flexibility, accuracy,etc. The idea is that the particular user needs and environment will define whichis the most appropriate tagger in each situation. Although we have observed slightdifferences, the accuracy results for the three taggers, tested on the WSJ test benchcorpus, are uniformly very high, and, if not better, they are at least as good asthose of a number of current taggers based on automatic acquisition (a qualitativecomparison with the most relevant current work is also reported.Additionally, our approach has been adapted to annotate a general Spanishcorpus, with the particular limitation of learning from small training sets. A newtechnique, based on tagger combination and bootstrapping, has been proposed toaddress this problem and to improve accuracy. Experimental results showed thatvery high accuracy is possible for Spanish tagging, with a relatively low manualeffort. Additionally, the success in this real application has confirmed the validity of our approach, and the validity of the previously presented portability argumentin favour of automatically acquired taggers

    Can humain association norm evaluate latent semantic analysis?

    Get PDF
    This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations

    Complex predicates: an LFG+glue analysis

    Full text link

    Proceedings of the 3rd Swiss conference on barrier-free communication (BfC 2020)

    Get PDF
    corecore