979 research outputs found
Recommended from our members
Phonotactic learning with neural language models
Computational models of phonotactics share much in common with language models, which assign probabilities to sequences of words. While state of the art language models are implemented using neural networks, phonotactic models have not followed suit. We present several neural models of phonotactics, and show that they perform favorably when compared to existing models. In addition, they provide useful insights into the role of representations on phonotactic learning and generalization. This work provides a promising starting point for future modeling of human phonotactic knowledge
Computer vision methods for unconstrained gesture recognition in the context of sign language annotation
Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency
Cued Speech Automatic Recognition in Normal Hearing and Deaf Subjects
International audienceThis article discusses the automatic recognition of Cued Speech in French based on hidden Markov models (HMMs)
Recommended from our members
Extending Hidden Structure Learning: Features, Opacity, and Exceptions
This dissertation explores new perspectives in phonological hidden structure learning (inferring structure not present in the speech signal that is necessary for phonological analysis; Tesar 1998, Jarosz 2013a, Boersma and Pater 2016), and extends this type of learning towards the domain of phonological features, towards derivations in Stratal OT (Bermúdez-Otero 1999), and towards exceptionality indices in probabilistic OT. Two more specific themes also come out: the possibility of inducing instead of pre-specifying the space of possible hidden structures, and the importance of cues in the data for triggering the use of hidden structure. In chapters 2 and 4, phonological features and exception groupings are induced by an unsupervised procedure that finds units not explicitly given to the learner. In chapters 2 and 3, there is an effect of non-specification or underspecification on the hidden level whenever the data does not give enough cues for that hidden level to be used. When features are hidden structure (chapter 2), they are only used for patterns that generalize across multiple segments. When intermediate derivational levels are hidden structure (chapter 3), the hidden structure necessary for opaque interactions is found more often when additional cues for the stratal affiliation of the opaque process are present in the data.
Chapter 1 motivates and explains the central questions in this dissertation. Chapter 2 shows that phonological features can be induced from groupings of segments (which is motivated by phonetic non-transparency of feature assignment, see, e.g., Anderson 1981), and that patterns that do not generalize across segments are formulated in terms of segments in such a model. Chapter 3 implements a version of Stratal OT (Bermúdez-Otero 1999), and confirms Kiparsky’s (2000) hypothesis that evidence for an opaque process’ stratal affiliation makes it easier to learn an opaque interaction, even when opaque interactions are more difficult to learn than their transparent counterparts. Chapter 4 proposes a probabilistic (instead of non-probabilistic; e.g. Pater 2010) learner for lexically indexed constraints (Pater 2000) in Expectation Driven Learning (Jarosz submitted), and demonstrates its effectiveness on Dutch stress (van der Hulst 1984, Kager 1989, Nouveau 1994, van Oostendorp 1997)
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
Linguistic typology aims to capture structural and semantic variation across
the world's languages. A large-scale typology could provide excellent guidance
for multilingual Natural Language Processing (NLP), particularly for languages
that suffer from the lack of human labeled resources. We present an extensive
literature survey on the use of typological information in the development of
NLP techniques. Our survey demonstrates that to date, the use of information in
existing typological databases has resulted in consistent but modest
improvements in system performance. We show that this is due to both intrinsic
limitations of databases (in terms of coverage and feature granularity) and
under-employment of the typological features included in them. We advocate for
a new approach that adapts the broad and discrete nature of typological
categories to the contextual and continuous nature of machine learning
algorithms used in contemporary NLP. In particular, we suggest that such
approach could be facilitated by recent developments in data-driven induction
of typological knowledge
THE USE OF SEGMENTATION CUES IN SECOND LANGUAGE LEARNERS OF ENGLISH
This dissertation project examined the influence of language typology on the use of segmentation cues by second language (L2) learners of English. Previous research has shown that native English speakers rely more on sentence context and lexical knowledge than segmental (i.e. phonotactics or acoustic-phonetics) or prosodic cues (e.g., word stresss) in native language (L1) segmentation. However, L2 learners may rely more on segmental and prosodic cues to identify word boundaries in L2 speech since it may require high lexical and syntactic proficiency in order to use lexical cues efficiently. The goal of this dissertation was to provide empirical evidence for the Revised Framework for L2 Segmentation (RFL2) which describes the relative importance of different levels of segmentation cues. Four experiments were carried out to test the hypotheses made by RFL2. Participants consisted of four language groups including native English speakers and L2 learners of English with Mandarin, Korean, or Spanish L1s. Experiment 1 compared the use of stress cues and lexical knowledge while Experiment 2 compared the use of phonotactic cues and lexical knowledge. Experiment 3 compared the use of phonotactic cues and semantic cues while Experiment 4 compared the use of stress cues and sentence context. Results showed that L2 learners rely more on segmental cues than lexical knowledge or semantic cues. L2 learners showed cue interaction in both lexical and sublexical levels whereas native speakers appeared to use the cues independently. In general, L2 learners appeared to have acquired sensitivity to the segmentation cues used in L2, although they still showed difficulty with specific aspects in each cue based on L1 characteristics. The results provided partial support for RFL2 in which L2 learners' use of sublexical cues was influenced by L1 typology. The current dissertation has important pedagogical implication as findings may help identify cues that can facilitate L2 speech segmentation and comprehension
Exploring the adaptive structure of the mental lexicon
The mental lexicon is a complex structure organised in terms of phonology, semantics and syntax, among other levels. In this thesis I propose that this structure can be explained in terms of the pressures acting on it: every aspect
of the organisation of the lexicon is an adaptation ultimately related to the function of language as a tool for human communication, or to the fact that language has to be learned by subsequent generations of people. A collection
of methods, most of which are applied to a Spanish speech corpus, reveal structure at different levels of the lexicon.• The patterns of intra-word distribution of phonological information may be a consequence of pressures for optimal representation of the lexicon in the brain, and of the pressure to facilitate speech segmentation.• An analysis of perceived phonological similarity between words shows that the sharing of different aspects of phonological similarity is related to different functions. Phonological similarity perception sometimes relates to morphology (the stressed final vowel determines verb tense and person) and at other times shows processing biases (similarity in the word initial and final segments is more readily perceived than in word-internal segments).• Another similarity analysis focuses on cooccurrence in speech to create a
representation of the lexicon where the position of a word is determined by the words that tend to occur in its close vicinity. Variations of context-based lexical space naturally categorise words
syntactically and semantically.• A higher level of lexicon structure is revealed by examining the relationships between the phonological and the cooccurrence similarity spaces. A study in Spanish supports the universality of the small but significant correlation between these two spaces found in English by Shillcock, Kirby, McDonald and Brew (2001). This systematicity across levels of representation adds an extra layer of structure that may help lexical acquisition and recognition. I apply it to a new paradigm to determine the function of parameters of
phonological similarity based on their relationships with the syntacticsemantic level. I find that while some aspects of a language's phonology maintain systematicity, others work against it, perhaps
responding to the opposed pressure for word identification.This thesis is an exploratory approach to the study of the mental lexicon structure that uses existing and new methodology to deepen our
understanding of the relationships between language use and language structure
Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing
Linguistic typology aims to capture structural and semantic variation across the world’s languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-utilization of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such an approach could be facilitated by recent developments in data-driven induction of typological knowledge.</jats:p
- …