19,906 research outputs found
Big words, small phrases: Mismatches between pause units and the polysynthetic word in Dalabon
This article uses instrumental data from natural speech to examine the phenomenon of pause placement within the verbal word in Dalabon, a polysynthetic Australian language of Arnhem Land. Though the phenomenon is incipient and in two sample texts occurs in only around 4% of verbs, there are clear possibilities for interrupting the grammatical word by pause after the pronominal prefix and some associated material at the left edge, though these within-word pauses are significantly shorter, on average, than those between words. Within-word pause placement is not random, but is restricted to certain affix boundaries; it requires that the paused-after material be at least dimoraic, and that the remaining material in the verbal word be at least disyllabic. Bininj Gun-wok, another polysynthetic language closely related to Dalabon, does not allow pauses to interrupt the verbal word, and the Dalabon development appears to be tied up with certain morphological innovations that have increased the proportion of closed syllables in the pronominal prefix zone of the verb. Though only incipient and not yet phonologized, pause placement in Dalabon verbs suggests a phonology-driven route by which polysynthetic languages may ultimately become less morphologically complex by fracturing into smaller units
Structure Theorem and Strict Alternation Hierarchy for FO^2 on Words
It is well-known that every first-order property on words is expressible
using at most three variables. The subclass of properties expressible with only
two variables is also quite interesting and well-studied. We prove precise
structure theorems that characterize the exact expressive power of first-order
logic with two variables on words. Our results apply to both the case with and
without a successor relation. For both languages, our structure theorems show
exactly what is expressible using a given quantifier depth, n, and using m
blocks of alternating quantifiers, for any m \leq n. Using these
characterizations, we prove, among other results, that there is a strict
hierarchy of alternating quantifiers for both languages. The question whether
there was such a hierarchy had been completely open. As another consequence of
our structural results, we show that satisfiability for first-order logic with
two variables without successor, which is NEXP-complete in general, becomes
NP-complete once we only consider alphabets of a bounded size
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
Most speech and language technologies are trained with massive amounts of
speech and text information. However, most of the world languages do not have
such resources or stable orthography. Systems constructed under these almost
zero resource conditions are not only promising for speech technology but also
for computational language documentation. The goal of computational language
documentation is to help field linguists to (semi-)automatically analyze and
annotate audio recordings of endangered and unwritten languages. Example tasks
are automatic phoneme discovery or lexicon discovery from the speech signal.
This paper presents a speech corpus collected during a realistic language
documentation process. It is made up of 5k speech utterances in Mboshi (Bantu
C25) aligned to French text translations. Speech transcriptions are also made
available: they correspond to a non-standard graphemic form close to the
language phonology. We present how the data was collected, cleaned and
processed and we illustrate its use through a zero-resource task: spoken term
discovery. The dataset is made available to the community for reproducible
computational language documentation experiments and their evaluation.Comment: accepted to LREC 201
On periodic points of free inverse monoid endomorphisms
It is proved that the periodic point submonoid of a free inverse monoid
endomorphism is always finitely generated. Using Chomsky's hierarchy of
languages, we prove that the fixed point submonoid of an endomorphism of a free
inverse monoid can be represented by a context-sensitive language but, in
general, it cannot be represented by a context-free language.Comment: 18 page
- âŠ