3,447 research outputs found

    Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

    Full text link
    Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cousin English has nor the sheer push of the law of numbers a language such as Chinese has. A number of research groups from Sri Lanka have noticed this dearth and the resultant dire need for proper tools and research for Sinhala natural language processing. However, due to various reasons, these attempts seem to lack coordination and awareness of each other. The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language tools and research so that the researchers working in this field can better utilize contributions of their peers. As such, we shall be uploading this paper to arXiv and perpetually update it periodically to reflect the advances made in the field

    Data Mining Ancient Script Image Data Using Convolutional Neural Networks

    Get PDF
    The recent surge in ancient scripts has resulted in huge image libraries of ancient texts. Data mining of the collected images enables the study of the evolution of these ancient scripts. In particular, the origin of the Indus Valley script is highly debated. We use convolutional neural networks to test which Phoenician alphabet letters and Brahmi symbols are closest to the Indus Valley script symbols. Surprisingly, our analysis shows that overall the Phoenician alphabet is much closer than the Brahmi script to the Indus Valley script symbols

    A computer-assisted pproach to the comparison of mainland southeast Asian languages

    Get PDF
    This cumulative thesis is based on three separate projects based on a computer-assisted language comparison (CALC) framework to address common obstacles to studying the history of Mainland Southeast Asian (MSEA) languages, such as sparse and non-standardized lexical data, as well as an inadequate method of cognate judgments, and to provide caveats to scholars who will use Bayesian phylogenetic analysis. The first project provides a format that standardizes the sound inventories, regulates language labels, and clarifies lexical items. This standardized format allows us to merge various forms of raw data. The format also summarizes information to assist linguists in researching the relatedness among words and inferring relationships among languages. The second project focuses on increasing the transparency of lexical data and cognate judg- ments with regard to compound words. The method enables the annotation of each part of a word with semantic meanings and syntactic features. In addition, four different conversion methods were developed to convert morpheme cognates into word cognates for input into the Bayesian phylogenetic analysis. The third project applies the methods used in the first project to create a workflow by merging linguistic data sets and inferring a language tree using a Bayesian phylogenetic algorithm. Further- more, the project addresses the importance of integrating cross-disciplinary studies into historical linguistic research. Finally, the methods we proposed for managing lexical data for MSEA languages are discussed and summarized in six perspectives. The work can be seen as a milestone in reconstructing human prehistory in an area that has high linguistic and cultural diversity

    A global phylogenomic analysis of the shiitake genus Lentinula

    Get PDF
    Lentinula is a broadly distributed group of fungi that contains the cultivated shiitake mushroom, L. edodes. We sequenced 24 genomes representing eight described species and several unnamed lineages of Lentinula from 15 countries on four continents. Lentinula comprises four major clades that arose in the Oligocene, three in the Americas and one in Asia–Australasia. To expand sampling of shiitake mushrooms, we assembled 60 genomes of L. edodes from China that were previously published as raw Illumina reads and added them to our dataset. Lentinula edodes sensu lato (s. lat.) contains three lineages that may warrant recognition as species, one including a single isolate from Nepal that is the sister group to the rest of L. edodes s. lat., a second with 20 cultivars and 12 wild isolates from China, Japan, Korea, and the Russian Far East, and a third with 28 wild isolates from China, Thailand, and Vietnam. Two additional lineages in China have arisen by hybridization among the second and third groups. Genes encoding cysteine sulfoxide lyase (lecsl) and γ-glutamyl transpeptidase (leggt), which are implicated in biosynthesis of the organosulfur flavor compound lenthionine, have diversified in Lentinula. Paralogs of both genes that are unique to Lentinula (lecsl 3 and leggt 5b) are coordinately up-regulated in fruiting bodies of L. edodes. The pangenome of L. edodes s. lat. contains 20,308 groups of orthologous genes, but only 6,438 orthogroups (32%) are shared among all strains, whereas 3,444 orthogroups (17%) are found only in wild populations, which should be targeted for conservation

    Evolution of the northern Australian flora: role of the Sunda-Sahul Floristic Exchange

    Get PDF
    Elizabeth Joyce investigated the Sunda-Sahul Floristic Exchange using floristic, phylogeographic and phylogenomic approaches. She compiled the first preliminary regional plant checklist, found the SSFE had a substantial impact on floristic composition, identified two exchange tracks from Southeast Asia into Australia, and found that in Anacardiaceae (Sapindales) extinction affected SSFE dynamics

    Molecular phylogenetic and biogeography of Sphenomorphini (Squamata: Scincidae)

    Get PDF
    Sphenomorphini consists of 549 species in 34 genera, making it the most diverse skink tribes. Species diversity is highest in Southeast Asia with species found from the middle east, Asia, Australia, North and Central America. Taxonomic relationships among many of the genera and species within the genera are contentious due to poor morphological diagnoses. This dissertation resolves many of these issues through the examination of multiple independent molecular markers. Using traditional and new phylogenetic approaches an estimate of the relationships in Sphenomorphini is obtained. Additionally, the biogeographic history of Sphenomorphini and certain subgroups are examined under a variety of different approaches. A new taxonomy is defined for portions of Sphenomorphini and new species are described in the Philippines. These taxonomic changes and the new estimate of phylogenetic relationships of Sphenomorphini contribute a substantial step forward in the understanding of skink relationships

    The Australasian dingo archetype: de novo chromosome-length genome assembly, DNA methylome, and cranial morphology

    Get PDF
    BACKGROUND: One difficulty in testing the hypothesis that the Australasian dingo is a functional intermediate between wild wolves and domesticated breed dogs is that there is no reference specimen. Here we link a high-quality de novo long-read chromosomal assembly with epigenetic footprints and morphology to describe the Alpine dingo female named Cooinda. It was critical to establish an Alpine dingo reference because this ecotype occurs throughout coastal eastern Australia where the first drawings and descriptions were completed. FINDINGS: We generated a high-quality chromosome-level reference genome assembly (Canfam_ADS) using a combination of Pacific Bioscience, Oxford Nanopore, 10X Genomics, Bionano, and Hi-C technologies. Compared to the previously published Desert dingo assembly, there are large structural rearrangements on chromosomes 11, 16, 25, and 26. Phylogenetic analyses of chromosomal data from Cooinda the Alpine dingo and 9 previously published de novo canine assemblies show dingoes are monophyletic and basal to domestic dogs. Network analyses show that the mitochondrial DNA genome clusters within the southeastern lineage, as expected for an Alpine dingo. Comparison of regulatory regions identified 2 differentially methylated regions within glucagon receptor GCGR and histone deacetylase HDAC4 genes that are unmethylated in the Alpine dingo genome but hypermethylated in the Desert dingo. Morphologic data, comprising geometric morphometric assessment of cranial morphology, place dingo Cooinda within population-level variation for Alpine dingoes. Magnetic resonance imaging of brain tissue shows she had a larger cranial capacity than a similar-sized domestic dog. CONCLUSIONS: These combined data support the hypothesis that the dingo Cooinda fits the spectrum of genetic and morphologic characteristics typical of the Alpine ecotype. We propose that she be considered the archetype specimen for future research investigating the evolutionary history, morphology, physiology, and ecology of dingoes. The female has been taxidermically prepared and is now at the Australian Museum, Sydney

    Convergent patterns suggest parallel processes of insular anuran diversification between oceanic archipelagos of the Southwest Pacific and the sky islands of the continental Western Ghats

    Get PDF
    The unprecedented surge in frog species descriptions over the last two decades has been attributed to increasing access to remote regions, more advanced technology and techniques, and greater interest in these groups. The advent of genetic methods had been welcomed by practitioners as a boon in identifying species and their relationships. Suggestions were made that significant diversity was yet unrecognized and that the genetic tools would help uncover “cryptic” species that are not obvious. This notion, however, is contentious, and has been debated. As part of my dissertation thesis, I re-evaluate groups of frogs from two highly biodiverse tropical regions in the Western Ghats of India and the Philippines Archipelago of the Western Pacific. In my first chapter, I revisit a clade of Nyctibatrachus Nightfrogs in the Southern Western Ghats with an integrative approach utilizing morphologic, molecular, bioacoustic, developmental and life history data and reveal that species diversity may likely be inflated in that group (the Nyctibatrachus aliciae group). In my second chapter, I similarly reevaluate a clade of Philippine Limnonectes Fanged Frogs and find evidence to reconfigure species boundaries in the Limnonectes magnus clade. The third chapter addressed the same Limnonectes clade, but with genomic data using the newly developed FrogCap protocol, and finds geneflow between some groups identified in the previous chapter, but not so in other groups, reinforcing some species boundaries while questioning others. My fourth chapter evaluates a species complex of Philippine endemic Pulchrana Spotted Frogs in the eastern islands of the archipelago with genomic data. The results show that Pulchrana grandocula and P. similis cluster together as a group with the remaining Philippine species of Pulchrana forming another. I also find that two formerly recognized rare species represented by singleton specimens have highly admixed genotypes calling into question whether these are indeed unique taxa. My final chapter explores a higher level genomic dataset of frogs of the superfamily Ranoidea with the inclusion of the three paleoendemic Indian ranoid families of Nyctibatrachidae, Micrixalidae and Ranixalidae. My results show for the first time that these three families form a single clade representing an Indian subcontinent-wide ancient in-situ radiation. Additionally, preliminary biogeography results based on this dataset support of a “ferry India” model that suggests that several non-African crown groups of ranoids may have evolved on an insular India during its transit from Gondwana to become Eurasia

    Reassessing Colugo Phylogeny, Taxonomy, and Biogeography by Genome Wide Comparisons and DNA Capture Hybridization from Museum Specimens

    Get PDF
    The ability to uncover the phylogenetic history of archived museum material with molecular techniques has rapidly improved due to the reduced cost and increased sequence capacity of next-generation sequencing technologies. However it remains difficult to isolate large, orthologous DNA regions across multiple divergent species. Here we describe the use of cross-species DNA capture hybridization techniques and next-generation sequencing to selectively isolate and sequence mitochondrial DNA genomes and nuclear DNA from the degraded DNA of museum specimens, using probes generated from the DNA of an extant species. Colugos are among the most poorly understood of all living mammals despite their central role in our understanding of higher-level primate relationships. Two described species of these extreme gliders are the sole living members of a unique mammalian order, Dermoptera, distributed throughout Southeast Asia. We generated a draft genome sequence for a Sunda colugo and a reference alignment for the Philippine colugo, and used these to identify colugo-specific enrichment in sensory and musculoskeletal related genes that likely underlie their nocturnal and gliding adaptations. Phylogenomic analysis and catalogs of rare genomic changes overwhelmingly support the hypothesis that colugos are the sister group to primates (Primatomorpha), to the exclusion of treeshrews. We also captured ~140-kb of orthologous sequence data from colugo museum specimens sampled across their range, and identified deep genetic structure between many geographically isolated populations of the two named species, consistent with a remarkable increase in diversity. Our results identify conservation units to mitigate future losses of this enigmatic mammalian order. Examining multiple distantly related mammals we identified a consistent pattern of early diversification between east and west Borneo including colugos, the lesser mouse deer, and pangolins. This strongly parallel biogeographic pattern is not common in mammals and we see no evidence for this pattern in the greater mouse deer. Colugos on West Borneo diverged from those in Indochina in the late Pliocene, however most other mammals across this same geographic region diverged from their common ancestor much more recently in the Pleistocene. Low genetic divergence between colugos on large landmasses and colugos on neighboring islands indicate that past forest distributions in the recent past were recently much larger than present refugial distributions
    corecore