10 research outputs found
Applying the Levenshtein Distance to Catalan dialects: A brief comparison of two dialectometric approaches 1
Abstract. In recent years, dialectometry has gained interest among Catalan dialectologists. As a consequence, a specific dialectometric approach has been developed at the University of Barcelona, which aims at increasing the accuracy of final groupings by means of discriminating the predictable components of the language from its unpredictable ones. Another popular method to obtain dialect distances is the Levenshtein Distance (LD) which has never been applied to a Catalan corpus so far. The goal of this paper is to present the results of applying the LD to a corpus of Catalan linguistic data, and to compare the results from this analysis both with the results from Barcelona and the traditional classifications of Catalan dialectology. 1
Studying dialects to understand human language
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (leaves 65-71).This thesis investigates the study of dialect variations as a way to understand how humans might process speech. It evaluates some of the important research in dialect identification and draws conclusions about how their results can give insights into human speech processing. A study clustering dialects using k-means clustering is done. Self-organizing maps are proposed as a tool for dialect research, and a self-organizing map is implemented for the purposes of testing this. Several areas for further research are identified, including how dialects are stored in the brain, more detailed descriptions of how dialects vary, including contextual effects, and more sophisticated visualization tools. Keywords: dialect, accent, identification, recognition, self-organizing maps, words, lexical sets, clustering.by Akua Afriyie Nti.M.Eng
Norwegian Dialects Examined Perceptually and Acoustically
Gooskens (2003) described an experiment which determined linguistic distances between 15 Norwegian dialects as perceived by Norwegian listeners. The results are compared to Levenshtein distances, calculated on the basis of transcriptions (of the words) of the same recordings as used in the perception experiment. The Levenshtein distance is equal to the sum of the weights of the insertions,deletions and substitutions needed to change one pronunciation into another. The success of the method depends on the reliability of the transcriber.The aim of this paper is to find an acoustic distance measure between dialects which approximates perceptual distance measure. We use and compare different representations of the acoustic signal: Barkfilter spectrograms, cochleagrams and formant tracks. We now apply the Levenshtein algorithm to spectra or formant value bundles instead of transcription segments. From these acoustic representations we got the best results using the formant track representation. However the transcription-based Levenshtein distances correlate still more closely. In the acoustic signal the speaker-dependent influence is kept to some extent, while a transcriber abstracts from voice quality. Using more samples per dialect word (instead of only one as in our research) should improve the accuracy of the measurements
Norwegian dialects examined perceptually and acoustically
Gooskens (2003) described an experiment which determined linguistic distances between 15 Norwegian dialects as perceived by Norwegian listeners. The results are compared to Levenshtein distances, calculated on the basis of transcriptions (of the words) of the same recordings as used in the perception experiment. The Levenshtein distance is equal to the sum of the weights of the insertions, deletions and substitutions needed to change one pronunciation into another. The success of the method depends on the reliability of the transcriber. The aim of this paper is to find an acoustic distance measure between dialects which approximates perceptual distance measure. We use and compare different representations of the acoustic signal: Barkfilter spectrograms, cochleagrams and formant tracks. We now apply the Levenshtein algorithm to spectra or formant value bundles instead of transcription segments. From these acoustic representations we got the best results using the formant track representation. However the transcription-based Levenshtein distances correlate still more closely. In the acoustic signal the speaker-dependent influence is kept to some extent, while a transcriber abstracts from voice quality. Using more samples per dialect word (instead of only one as in our research) should improve the accuracy of the measurements
Subsidia: Tools and Resources for Speech Sciences
Este libro, resultado de la colaboración de investigadores expertos en sus respectivas áreas, pretende ser una ayuda a la comunidad cientÃfica en tanto en cuanto recopila y describe una serie de materiales de gran utilidad para seguir avanzando en la investigació
concepts - methods - visualization
While Darwin’s grand view of evolution has undergone many changes and shown up
in many facets, there remains one outstanding common feature in its 150-year
history: since the very beginning, branching trees have been the dominant
scheme for representing evolutionary processes. Only recently, network models
have gained ground reflecting contact-induced mixing or hybridization in
evolutionary scenarios. In biology, research on prokaryote evolution indicates
that lateral gene transfer is a major feature in the evolution of bacteria. In
the field of linguistics, the mutual lexical and morphosyntactic borrowing
between languages seems to be much more central for language evolution than
the family tree model is likely to concede. In the humanities, networks are
employed as an alternative to established phylogenetic models, to express the
hybridization of cultural phenomena, concepts or the social structure of
science. However, an interdisciplinary display of network analyses for
evolutionary processes remains lacking. Therefore, this volume includes
approaches studying the evolutionary dynamics of science, languages and
genomes, all of which were based on methods incorporating network approaches
Gaelic dialects present and past: a study of modern and medieval dialect relationships in the Gaelic languages
This thesis focuses on the historical development of dialectal variation in the Gaelic
languages with special reference to Irish. As a point of departure, competing
scholarly theories concerning the historical relationships between Goidelic dialects
are laid out. Next, these theories are tested using dialectometric methods of linguistic
analysis. Dialectometry clearly suggests the Irish of Ulster is the most linguistically
distinctive of Irish dialects. This perspective on the modern dialects is utilised in
subsequent chapters to clarify our understanding of the history of Gaelic dialectal
variation, especially during the Old Irish period (AD 600–900).
Theoretical and methodological frameworks that have been used in the study of the
historical dialectology of Gaelic are next outlined. It is argued that these frameworks
may not be the most appropriate for investigating dialectal variation during the Old
Irish period. For the first time, principles from historical sociolinguistics are here
applied in investigating the language of the Old Irish period. In particular, the social
and institutional structures which supported the stability of Old Irish as a text
language during the 8th and 9th centuries are scrutinised from this perspective. The
role of the ecclesiastical and political centre of Armagh as the principal and central
actor in the relevant network structures is highlighted.
Focus then shifts to the processes through which ‘standard’ languages emerge, with
special reference to Old Irish. The evidence of a small number of texts upon which
modern understandings of Old Irish was based is assessed; it is argued that these
texts most likely emerged from monasteries in the northeast of Ireland and the
southwest of Scotland. Secondly, the processes through which the standard of the
Old Irish period is likely to have come about are investigated. It is concluded that the
standard language of the period arose primarily through the agency of monastic
schools in the northeast of Ireland, particularly Armagh and Bangor. It is argued that
this fact, and the subsequent prominence of Armagh as a stable and supremely
prestigious centre of learning throughout the period, offers a sociolinguistically
robust explanation for the apparent lack of dialectal variation in the language.
Finally, the socio-political situation of the Old Irish period is discussed. Models of
new-dialect formation are applied to historical evidence, and combined with later
linguistic evidence, in an attempt to enunciate dialectal divisions which may have
existed during the period