201 research outputs found

    The Role Of Morphological Structure In Phonetic Variation

    Get PDF
    This dissertation is situated in broad debates about the architecture of the phonological grammar, and the sensitivity of gradient phonetic parameters to morphological structure. It takes, as its primary case study, a linguistic variable that is of prevailing interest to sociolinguists and phonologists alike: English Coronal Stop Deletion (old~ol\u27; CSD). While CSD is robustly sensitive to the morphological class of words in which coronal stops are contained, its alignment with the small class of other morphology--phonetics interactions is not straightforward. I approach this problem from several angles, incorporating diverse methodologies. In the first place, I provide new articulatory evidence suggesting that CSD does indeed have its primary locus in the gradient phonetics, demonstrating that the magnitude of tongue tip raising to a coronal stop constriction is gradiently conditioned by morphology. Moreover, this variation is typologically distinct from the majority of other examples of phonetic phenomena conditioned by morphology, which primarily concern durational parameters. In the rest of the dissertation, I problematise CSD\u27s status as exceptional in this way, probing how well explanations for other morphology-sensitive phonetic phenomena (i.e. effects of prosody and word predictability) account for CSD patterns. In two perception experiments, listeners do not show perceptual sensitivity to the covert tongue tip raising observed in articulation, but do reflect an association between morphological complexity and increased duration. Finally, a large-scale corpus study shows only measures of word frequency that are relative to a word’s larger morphological paradigm predict CSD patterns accurately. This suggests that morphological structure was a key missing element in predictability accounts of the variable. Ultimately, surface CSD may amount to the confluence of more than one type of morphologically conditioned phonetic phenomenon. This dissertation sets the stage for continued progress towards an account integrating these different factors, and generates new puzzles in the asymmetry between production and perception for variable phonology and phonetics

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    Cognition-based approaches for high-precision text mining

    Get PDF
    This research improves the precision of information extraction from free-form text via the use of cognitive-based approaches to natural language processing (NLP). Cognitive-based approaches are an important, and relatively new, area of research in NLP and search, as well as linguistics. Cognitive approaches enable significant improvements in both the breadth and depth of knowledge extracted from text. This research has made contributions in the areas of a cognitive approach to automated concept recognition in. Cognitive approaches to search, also called concept-based search, have been shown to improve search precision. Given the tremendous amount of electronic text generated in our digital and connected world, cognitive approaches enable substantial opportunities in knowledge discovery. The generation and storage of electronic text is ubiquitous, hence opportunities for improved knowledge discovery span virtually all knowledge domains. While cognition-based search offers superior approaches, challenges exist due to the need to mimic, even in the most rudimentary way, the extraordinary powers of human cognition. This research addresses these challenges in the key area of a cognition-based approach to automated concept recognition. In addition it resulted in a semantic processing system framework for use in applications in any knowledge domain. Confabulation theory was applied to the problem of automated concept recognition. This is a relatively new theory of cognition using a non-Bayesian measure, called cogency, for predicting the results of human cognition. An innovative distance measure derived from cogent confabulation and called inverse cogency, to rank order candidate concepts during the recognition process. When used with a multilayer perceptron, it improved the precision of concept recognition by 5% over published benchmarks. Additional precision improvements are anticipated. These research steps build a foundation for cognition-based, high-precision text mining. Long-term it is anticipated that this foundation enables a cognitive-based approach to automated ontology learning. Such automated ontology learning will mimic human language cognition, and will, in turn, enable the practical use of cognitive-based approaches in virtually any knowledge domain --Abstract, page iii

    Gesture in Automatic Discourse Processing

    Get PDF
    Computers cannot fully understand spoken language without access to the wide range of modalities that accompany speech. This thesis addresses the particularly expressive modality of hand gesture, and focuses on building structured statistical models at the intersection of speech, vision, and meaning.My approach is distinguished in two key respects. First, gestural patterns are leveraged to discover parallel structures in the meaning of the associated speech. This differs from prior work that attempted to interpret individual gestures directly, an approach that was prone to a lack of generality across speakers. Second, I present novel, structured statistical models for multimodal language processing, which enable learning about gesture in its linguistic context, rather than in the abstract.These ideas find successful application in a variety of language processing tasks: resolving ambiguous noun phrases, segmenting speech into topics, and producing keyframe summaries of spoken language. In all three cases, the addition of gestural features -- extracted automatically from video -- yields significantly improved performance over a state-of-the-art text-only alternative. This marks the first demonstration that hand gesture improves automatic discourse processing

    On the Analysis of DNA Methylation

    Get PDF
    Recent genome-wide studies lend support to the idea that the patterns of DNA methylation are in some way related either causally or as a readout of cell-type specific protein binding. We lay the groundwork for a framework to test whether the pattern of DNA methylation levels in a cell combined with protein binding models is sufficient to completely describe the location of the component of proteins binding to its genome in an assayed context. There is only one method, whole-genome bisulfite sequencing, WGBS, available to study DNA methylation genome-wide at such high resolution, however its accuracy has not been determined on the scale of individual binding locations. We address this with a two-fold approach. First, we developed an alternative high-resolution, whole-genome assay using a combination of an enrichment-based and a restriction-enzyme-based assay of methylation, methylCRF. While both assays are considered inferior to WGBS, by using two distinct assays, this method has the advantage that each assay in part cancels out the biases of the other. Additionally, this method is up to 15 times lower in cost than WGBS. By formulating the estimation of methylation from the two methods as a structured prediction problem using a conditional random field, this work will also address the general problem of incorporating data of varying qualities -a common characteristic of biological data- for the purpose of prediction. We show that methylCRF is concordant with WGBS within the range of two WGBS methylomes. Due to the lower cost, we were able to analyze at high-resolution, methylation across more cell-types than previously possible and estimate that 28% of CpGs, in regions comprising 11% of the genome, show variable methylation and are enriched in regulatory regions. Secondly, we show that WGBS has inherent resulution limitations in a read count dependent manner and that the identification of unmethylated regions is highly affected by GC-bias in the underlying protocol suggesting simple estimate procedures may not be sufficient for high-resolution analysis. To address this, we propose a novel approach to DNA methylation analysis using change point detection instead of estimating methylation level directly. However, we show that current change-point detection methods are not robust to methylation signal, we therefore explore how to extend current non-parametric methods to simultaneously find change-points as well as characteristic methylation levels. We believe this framework may have the power to examine the connection between changes in methylation and transcription factor binding in the context of cell-type specific behaviors

    Proceedings of the VIIth GSCP International Conference

    Get PDF
    The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)

    Illuminating variation:Individual differences in entrenchment of multi-word units

    Get PDF
    Illuminating variation: individual differences in entrenchment of multi-word units - Véronique Verhagen Een groot deel van ons taalgebruik bestaat uit woordcombinaties die steeds weer terugkomen. Hoe vaker je een woordcombinatie gebruikt, hoe sterker het verankerd raakt in je mentale lexicon, waardoor het gemakkelijker wordt om deze constructie te activeren en te verwerken. Het ontdekken van patronen en het creëren van routines zijn cognitieve vaardigheden die in we allerlei domeinen gebruiken. Als je een wachtwoord herhaaldelijk hebt getypt, wordt die specifieke combinatie van toetsaanslagen een routine (en kost het juist enige moeite er van af te wijken). Als je altijd dezelfde route naar werk fietst, maak je niet meer bij elk kruispunt een bewuste keuze. Een vergelijkbaar proces doet zich voor bij frequent gebruik van een combinatie van woorden: de combinatie raakt steeds sterker verankerd als een eenheid. Aangezien mensen van elkaar verschillen in de frequentie waarmee ze bepaalde woordcombinaties gebruiken, is de verwachting dat de mate waarin een woordcombinatie verankerd is van persoon tot persoon verschilt. Aangezien een taalgebruiker nieuwe ervaringen met taal opdoet in de loop der tijd, is de verwachting dat iemands mentale representaties mee veranderen. Empirische data over deze vormen van variatie tussen en binnen volwassen moedertaalsprekers zijn echter schaars. Veel onderzoekers maken gebruik van corpusdata: een grote collectie teksten, waarmee je inzicht krijgt in de frequenties waarmee woordcombinaties in ons taalgebruik voorkomen. Een corpus is een rijke bron aan informatie, maar het is onwaarschijnlijk dat de corpusfrequenties voor iedereen even representatief zijn. Verder worden de uitkomsten van experimenten doorgaans als gemiddeldes gepresenteerd, zonder in te gaan op de variatie tussen participanten. Als data uit verschillende experimenten met elkaar vergeleken worden, zijn die experimenten vaak bij verschillende groepen participanten afgenomen. Véronique Verhagen doet in haar proefschrift verslag van onderzoek naar variatie tussen en binnen volwassenen in metalinguïstische oordelen over, en verwerking van woordcombinaties. Ze heeft onderzocht in hoeverre deze variatie betekenisvol is en wat de toegevoegde waarde is van gepersonaliseerde maten als het doel is inzicht te krijgen in mentale representaties van taal. Haar onderzoek draagt bij aan de verfijning van de theorievorming en de onderzoeksmethodes binnen de gebruiksgebaseerde taalkunde, die stelt dat mentale representaties van taal gebaseerd zijn op ervaringen met taal en algemene cognitieve vaardigheden zoals patroonherkenning, categorisering, en chunking. Om variatie te onderzoeken heeft Verhagen in twee studies gebruik gemaakt van een test-hertestontwerp. Deelnemers kenden vertrouwdheidsoordelen toe aan 80 woordcombinaties (bijv. op de bank, in de lucht). Variatie binnen een individu is onderzocht door de deelnemers de taak twee keer te laten uitvoeren in een periode van één tot drie weken. De correlatie tussen de geaggregeerde waardes op moment 1 en moment 2 was bijna perfect. Tegelijkertijd was er sprake aanzienlijke variatie tussen en binnen participanten in oordelen. In plaats van deze variatie af te doen als ruis, zou nagegaan moeten worden of het ons iets kan vertellen over het dynamische karakter van mentale representaties. In experimenten die hier inzicht in proberen te verschaffen wordt de mate van verankering altijd uitgedrukt in één waarde (bijv. een vertrouwdheidsscore of een reactietijd). Als een mentale representatie niet één punt is, maar een cluster van exemplars die variëren in sterkte, afhankelijk van hoe frequent en recent bepaalde constructies zijn gebruikt, dan is één waarde slechts een deel van het plaatje. Voor een vollediger en waarheidsgetrouwer beeld zijn meerdere metingen per individu nodig. In een andere studie hebben recruiters, werkzoekenden, en studenten die nog (vrijwel) geen vacatureteksten hadden gelezen drie experimenten uitgevoerd. Hiermee werd een verband aangetoond tussen enerzijds de mate van ervaring met een bepaald register en anderzijds (i) de verwachtingen die mensen genereren over woorden die mogelijk volgen wanneer ze woordsequenties zien die kenmerkend zijn voor dat register (bijv. werving en …, met behoud van …); (ii) de snelheid waarmee ze dergelijke woordcombinaties verwerken (bijv. werving en selectie); en (iii) hoe vertrouwd deze woordcombinaties voor hen zijn. Bovendien bleken iemands scores uit de ene taak een statistische voorspeller voor iemands prestaties in de andere taken. Individuele scores verklaarden variatie die groepsscores (bijv. corpusfrequenties) niet verklaarden. Dit toont aan dat er systematische verschillen zijn tussen mensen in kennis en verwerking van woordcombinaties. Als we tot accurate theorieën over de cognitieve representatie van taal willen komen, is het van belang dat we in kaart brengen welke factoren de variatie tussen taalgebruikers bepalen, en dit vereist dat we ons niet beperken tot geaggregeerde data, maar inzoomen op het niveau van individuen

    Integrating meta-information into exemplar-based speech recognition with segmental conditional random fields

    No full text
    Exemplar based recognition systems are characterized by the fact that, instead of abstracting large amounts of data into compact models, they store the observed data enriched with some annotations and infer on-the-fly from the data by finding those exemplars that resemble the input speech best. One advantage of exemplar based systems is that next to deriving what the current phone or word is, one can easily derive a wealth of meta-information concerning the chunk of audio under investigation. In this work we harvest meta-information from the set of best matching exemplars, that is thought to be relevant for the recognition such as word boundary predictions and speaker entropy. Integrating this meta-information into the recognition framework using segmental conditional random fields, reduced the WER of the exemplar based system on the WSJ Nov92 20k task from 8.2% to 7.6%. Adding the HMM-score and multiple HMM phone detectors as features further reduced the error rate to 6.6%. © 2011 IEEE.Demuynck K., Seppi D., Van Compernolle D., Nguyen P., Zweig G., ''Integrating meta-information into exemplar-based speech recognition with segmental conditional random fields'', 36th international conference on acoustics, speech and signal processing - ICASSP’2011, pp. 5048-5051, May 22-27, 2011, Prague, Czech Republic.status: publishe

    Decoding speech comprehension from continuous EEG recordings

    Get PDF
    Human language is a remarkable manifestation of our cognitive abilities which is unique to our species. It is key to communication, but also to our faculty of generating complex thoughts. We organise, conceptualise, and share ideas through language. Neuroscience has shed insightful lights on our understanding of how language is processed by the brain although the exact neural organisation, structural or functional, underpinning this processing remains poorly known. This project aims to employ new methodology to understand speech comprehension during naturalistic listening condition. One achievement of this thesis lies in bringing evidence towards putative predictive processing mechanisms for language comprehension and confront those with rule-based grammar processing. Namely, we looked on the one hand at cortical responses to information-theoretic measures that are relevant for predictive coding in the context of language processing and on the other hand to the response to syntactic tree structures. We successfully recorded responses to linguistic features from continuous EEG recordings during naturalistic speech listening. The use of ecologically valid stimuli allowed us to embed neural response in the context in which they naturally occur when hearing speech. This fostered the development of new analysis tools adapted for such experimental designs. Finally, we demonstrate the ability to decode comprehension from the EEG signals of participants with above-chance accuracy. This could be used as a better indicator of the severity and specificity of language disorders, and also to assess if a patient in a vegetative state understands speech without the need for any behavioural response. Hence a primary outcome is our contribution to the neurobiology of language comprehension. Furthermore, our results pave the way to the development of a new range of diagnostic tools to measure speech comprehension of patients with language impairment.Open Acces
    • …
    corecore