30 research outputs found

    Unsupervised learning of allomorphs in Turkish

    Get PDF
    © 2017 The Author. Published by The Scientific and Technological Research Council of Turkey. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://journals.tubitak.gov.tr/elektrik/issues/elk-17-25-4/elk-25-4-57-1605-216.pdfOne morpheme may have several surface forms that correspond to allomorphs. In English, ed and d are surface forms of the past tense morpheme, and s, es, and ies are surface forms of the plural or present tense morpheme. Turkish has a large number of allomorphs due to its morphophonemic processes. One morpheme can have tens of different surface forms in Turkish. This leads to a sparsity problem in natural language processing tasks in Turkish. Detection of allomorphs has not been studied much because of its difficulty. For example, t¨u and di are Turkish allomorphs (i.e. past tense morpheme), but all of their letters are different. This paper presents an unsupervised model to extract the allomorphs in Turkish. We are able to obtain an F-measure of 73.71% in the detection of allomorphs, and our model outperforms previous unsupervised models on morpheme clustering.Published versio

    Fluid Morphing for 2D Animations

    Get PDF
    Professionaalsel tasemel animeerimine on aeganõudev ja kulukas tegevus. Seda eriti sõltumatule arvutimängude tegijale. Siit tulenevalt osutub kasulikuks leida meetodeid, mis võimaldaks programmaatiliselt suurendada kaadrite arvu igas kahemõõtmelises raster animatsioonis. Vedeliku simulaatoriga eksperimenteerimine andis käesoleva töö autoritele idee, kuidas saavutada visuaalselt meeldiv kaadrite üleminek, kasutades selleks vedeliku dünaamikat. Tulemusena valmis programm, mis võib animaatori efektiivsust tõsta lausa mitmeid kordi. Autorid usuvad, et see avastus võib viia kahemõõtmeliste animatsioonide uuele võidukäigule — näiteks kaasaegsete arvutimängude kontekstis.Creation of professional animations is expensive and time-consuming, especially for the independent game developers. Therefore, it is rewarding to find a method that would programmatically increase the frame rate of any two-dimensional raster animation. Experimenting with a fluid simulator gave the authors an insight that to achieve visually pleasant and smooth animations, elements from fluid dynamics can be used. As a result, fluid image morphing was developed, allowing the animators to produce more significant frames than they would with the classic methods. The authors believe that this discovery could reintroduce hand drawn animations to modern computer games

    Conversational Arabic Automatic Speech Recognition

    Get PDF
    Colloquial Arabic (CA) is the set of spoken variants of modern Arabic that exist in the form of regional dialects and are considered generally to be mother-tongues in those regions. CA has limited textual resource because it exists only as a spoken language and without a standardised written form. Normally the modern standard Arabic (MSA) writing convention is employed that has limitations in phonetically representing CA. Without phonetic dictionaries the pronunciation of CA words is ambiguous, and can only be obtained through word and/or sentence context. Moreover, CA inherits the MSA complex word structure where words can be created from attaching affixes to a word. In automatic speech recognition (ASR), commonly used approaches to model acoustic, pronunciation and word variability are language independent. However, one can observe significant differences in performance between English and CA, with the latter yielding up to three times higher error rates. This thesis investigates the main issues for the under-performance of CA ASR systems. The work focuses on two directions: first, the impact of limited lexical coverage, and insufficient training data for written CA on language modelling is investigated; second, obtaining better models for the acoustics and pronunciations by learning to transfer between written and spoken forms. Several original contributions result from each direction. Using data-driven classes from decomposed text are shown to reduce out-of-vocabulary rate. A novel colloquialisation system to import additional data is introduced; automatic diacritisation to restore the missing short vowels was found to yield good performance; and a new acoustic set for describing CA was defined. Using the proposed methods improved the ASR performance in terms of word error rate in a CA conversational telephone speech ASR task

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    An investigation into deviant morphology : issues in the implementation of a deep grammar for Indonesian

    Get PDF
    This thesis investigates deviant morphology in Indonesian for the implementation of a deep grammar. In particular we focus on the implementation of the verbal suffix -kan. This suffix has been described as having many functions, which alter the kinds of arguments and the number of arguments the verb takes (Dardjowidjojo 1971; Chung 1976; Arka 1993; Vamarasi 1999; Kroeger 2007; Son and Cole 2008). Deep grammars or precision grammars (Butt et al. 1999a; Butt et al. 2003; Bender et al. 2011) have been shown to be useful for natural language processing (NLP) tasks, such as machine translation and generation (Oepen et al. 2004; Cahill and Riester 2009; Graham 2011), and information extraction (MacKinlay et al. 2012), demonstrating the need for linguistically rich information to aid NLP tasks. Although these linguistically-motivated grammars are invaluable resources to the NLP community, the biggest drawback is the time required for the manual creation and curation of the lexicon. Our work aims to expedite this process by applying methods to assign syntactic information to kan-affixed verbs automatically. The method we employ exploits the hypothesis that semantic similarity is tightly connected with syntactic behaviour (Levin 1993). Our endeavour in automatically acquiring verbal information for an Indonesian deep grammar poses a number of lingustic challenges. First of all Indonesian verbs exhibit voice marking that is characteristic of the subgrouping of its language family. In order to be able to characterise verbal behaviour in Indonesian, we first need to devise a detailed analysis of voice for implementation. Another challenge we face is the claim that all open class words in Indonesian, at least as it is spoken in some varieties (Gil 1994; Gil 2010), cannot linguistically be analysed as being distinct from each other. That is, there is no distiction between nouns, verbs or adjectives in Indonesian, and all word from the open class categories should be analysed uniformly. This poses difficulties in implementing a grammar in a linguistically motivated way, as well discovering syntactic behaviour of verbs, if verbs cannot be distinguished from nouns. As part of our investigation we conduct experiments to verify the need to employ word class categories, and we find that indeed these are linguistically motivated labels in Indonesian. Through our investigation into deviant morphological behaviour, we gain a better characterisation of the morphosyntactic effects of -kan, and we discover that, although Indonesian has been labelled as a language with no open word class distinctions, word classes can be established as being linguistically-motivated

    Applications of machine learning to problems in biomolecular function and dynamics

    Get PDF
    Biomolecules such as proteins and nucleic acids are involved in all biological processes. As they take part in these processes, biomolecules often undergo motions and changes in their conformation that are related to their function. This thesis presents research into and development of methods to support the study of the dynamics of these changes and their relationship to the biomolecular function. Due to the scale of the structures and speed of the changes, common methods of determining (or “solving”) the structures of biomolecules cannot capture the change in conformation. Detail of the changes must be extrapolated from changes observed between multiple solved states of the same structure. We present a novel method of visualising potential motions of atoms comprising biomolecules, estimated from solved structures at the start and end of the trajectory. Comparisons show that our method produces atomic coordinates that pass closer to known intermediates than those produced by similar existing methods. Our visualisations treat each atom as an individual body, but the conformational changes of proteins can be broken down into the motions of “dynamic domains”, which are sections of proteins that move semi-rigidly, controlled by flexible hinge bending regions. Tools such as the DynDom program identify and analyse the motions of these dynamic domains displayed between pairs of solved structures. We designed and developed DynDom6D, a new version of the DynDom program for very large macromolecules that assigns atoms to domains or hinge bending regions using 6-dimensional k-means clustering

    Phylogeography and Population Structure in Highly Mobile Marine Taxa in the Western Indian Ocean: Bottlenose Dolphins (Tursiops spp.) and Common Dolphins (Delphinus sp.)

    Get PDF
    In the marine environment, where barriers to dispersal are limited, taxa normally exhibit genetic homogeneity across large spatial scales. Extraordinarily, marine mammals regularly exhibit genetic differentiation within their cruising range. Furthermore, recent radiation in Delphininae has resulted in several closely related species that remain taxonomically unresolved, particularly bottlenose dolphins (BND) Tursiops spp. and common dolphins (CD) Delphinus spp., making these taxa interesting for studying evolutionary processes. Using mitogenomes and a multi-locus dataset, BNDs from the northwest Indian Ocean (IO) were compared with other recognized species/ecotypes around the world. A new (third) lineage of Indo-Pacific BND, T. aduncus, was identified from the region. Reconstructions of ancestral biogeography and divergence date estimates, suggest a divergence mechanism within T. aduncus that coincides with climate change over the Pleistocene. Reconstructions of ancestral morphology suggest a coastal ancestry for BNDs. Significant population structure was exhibited between T. aduncus populations in the western IO based on mtDNA control region sequences and 14 microsatellite loci. Genetic subdivision appears to correlate with habitat heterogeneity across the study area, which may be driving differentiation through local adaption. Traditional and geometric morphometric techniques were used to investigate congruency between genetic and phenotypic differentiation of three BND lineages in the northwest IO. Strong differences were exhibited in morphology between common BNDs, T. truncatus, and T. aduncus. The T. aduncus lineages were similar, however significant differences in morphology were evident. Significant genetic structure was evident between CD populations off Portugal, South Africa and Oman, based on mtDNA sequences and 14 microsatellites. Further analyses support the taxonomic designation of D. capensis tropicalis in the northwest IO. Both genera exhibit significant population structure over spatial scales outdistanced by their dispersal abilities. Contemporary and historic environmental heterogeneity are suggested as drivers for this structure. Further evidence is provided for the northwest/northern IO as a region of evolutionary endemism, which will inform regional conservation initiatives

    Evolutionary relationships of East African soda lake cichlid fish

    Get PDF
    This thesis examines the evolutionary relationships of the Alcolapia soda lake cichlid fishes of East Africa. The introduction presents background on the soda lakes in which the cichlids are found, the taxonomy and biology of the fishes, as well as the theoretical background to the study. Chapter two discusses the methods used in the thesis, addressing the benefits and limitations of each, as well as their suitability to the study in hand. Chapter three investigates the phylogenetics and phylogeography of soda lake cichlids sampled at several populations around the soda lakes and a single transplanted population outside of the focal lakes, employing a large genomic dataset generated through restriction site associated DNA (RAD) sequencing, and demonstrates low levels of interspecific genomic differentiation with high levels of ongoing gene flow. Chapter four uses the RAD dataset to test for signals of selection between Alcolapia species, employing genome-wide scans and outlier detection to characterise peaks of genomic divergence between species. Chapter five combines morphological (geometric morphometrics) and ecological (stable isotope, stomach contents) data with the RAD dataset from chapter three to consider biologically relevant diversification between Alcolapia species, testing for convergence and niche adaptation. Chapter six examines the ecomorphology of the soda lake fishes at an intraspecific level, testing for effects of geography and environment on morphological differentiation between populations. Finally, chapter seven draws together the conclusions inferred from the thesis, and discusses possible future directions for research in this system

    The appendicular skeleton variability of the Sauropoda Titanosauria from the Upper Cretaceous of Lo Hueco (Cuenca, Spain)

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología. Fecha de lectura: 24-01-2020Esta tesis tiene embargado el acceso al texto completo hasta el 24-07-2021En este volumen se presentan nuevos datos acerca del esqueleto apendicular de los titanosaurios del yacimiento Campano-Maastrichtiense de Lo Hueco (Cuenca, España). En este yacimiento se ha recuperado una muestra abundante de restos referidos a saurópodos titanosaurios, con varios ejemplares en conexión y decenas de ejemplares aislados. En esta muestra se identifica una elevada variabilidad morfológica en cada tipo de elemento apendicular y la presencia de ejemplares de pequeño tamaño. Hasta ahora solo se ha descrito en el yacimiento una forma exclusiva de titanosaurio, Lohuecotitan pandafilandi. No obstante, los estudios de los abundantes restos encontrados en el yacimiento habían permitido identificar dos morfotipos principales de dientes, dos tipos de basicraneos de titanosaurio, tres posibles morfotipos identificados en el esqueleto axial correspondiente a las vértebras dorsales, y cuatro morfotipos en el estudio de las vértebras caudales. En el presente estudio se explora la elevada variabilidad encontrada en la muestra de restos apendiculares. Para ello se utilizan una serie de técnicas analíticas relacionadas con el machine learning y la morfometría geométrica en 3D con el objetivo de identificar posibles morfotipos que ayuden a explicar esta variabilidad. Se desarrolla un flujo de trabajo de digitalización del ejemplar en 3D, proceso de restauración virtual en caso de ser ejemplares fragmentarios, y su posterior análisis estadístico. Mediante estas técnicas se determina la presencia de dos morfotipos principales. A partir de esta identificación, se procede a la cuantificación de la variabilidad intraespecífica en cada uno de ellos, así como la determinación de posibles secuencias ontogenéticas y la variabilidad debida a cambios durante el crecimiento del esqueleto apendicular de los titanosaurios. Algunos indicios apuntan a la que los dos morfotipos identificados en el yacimiento pertenecerían a dos gremios distintos que tendrían dos estrategias tróficas distintas. En el presente trabajo se discuten las posibles implicaciones en las diferencias morfológicas observadas entre ambos morfotipos principales. Se realiza un modelo aproximado con el que relacionar la morfología general de las extremidades en neosaurópodos con estos dos tipos de gremios y se relacionan los dos morfotipos principales con dos estrategias tróficas congruentes con los datos del estudio de material craneal. La variabilidad intraespecífica observada en cada morfotipo permite determinar sus implicaciones en la codificación de caracteres morfológicos apendiculares. En este trabajo se han identificado varias secuencias ontogenéticas relativas a cada tipo de elemento analizado. Se describe en detalle por primera vez las secuencias de transformaciones ontogenéticas en estos titanosaurios, así como el estadio y tiempo relativo en que se producen dichos cambios y sus implicaciones en las codificaciones de caracteres morfológicosIn the current dissertation a revision of new data of the appendicular skeleton of the Campanian-Maastrichtian fossil site of Lo Hueco (Cuenca, Spain) is presented. This fossil site have yielded an abundant sample of specimens referable to titanosaur sauropods, with several individuals partially articulated and tens of isolated specimens. There has been identified a high morphological variability in each appendicular element and the presence of several small-sized specimens in this sample. Until now, a single titanosaur exclusive form have been described, Lohuecotitan pandafilandi. However, the study of abundant isolated specimens from the fossil site have allowed to identify two main teeth morphotypes, two types of braincase, three morphotypes identified in the axial skeleton of the dorsal region, and four morphotypes among the caudal vertebrae. The current study explores the high variability found in the sample of appendicular elements. For this matter, a series of analytical techniques related with modern machine learning and 3D geometric morphometrics are used with the objective of identifying the probable morphotypes that help explain the morphological variance. A 3D digitizing workflow of the specimens of study is herein proposed, with a new proposal for virtual restoration of fragmentary elements and its incorporation to statistical analyses. Using these techniques it has been identified two main appendicular morphotypes. Based on this morphotypes, the intraspecific variability has been quantified in each of them, the ontogenetic sequences have been identified and the variability related to transformations during titanosaur ontogenetic development. Previous studies indicates that two titanosaur morphotype from Lo Hueco could have been pertain to two different guilds with two different types of feeding niche exploitation. In the current study, the implications of several morphological differences between both main morphotypes are discussed under the hypothesis of differences in the ecomorphological specialization. A statistical proxy model was created to test the relationships between main appendicular morphology with ecomorphological specialization related with the height of the feeding envelope among neosauropods. The results allow relating the two main morphotypes with two different feeding niche exploitation strategies congruent with previous analyses in the cranial material. The observed intraspecific variability in each morphotype allows determining its impact on morphological character scoring. In the current dissertation it has been identified the presence of several ontogenetic sequences in each morphotype. The ontogenetic sequences have been comprehensively described for first time in this group, as well as the ontogenetic stage and relative time estimation of the morphological character changes with implications for character scoringsEsta tesis fue realizada gracias a la Ayuda para Contratos Predoctorales para la Formación de Doctores BES-2013-065509 - Ministerio de Economía y Competitividad. Esta beca doctoral está asociada al Proyecto de Investigación CGL2012-35199 - Ministerio de Economía y Competitivida
    corecore