5 research outputs found

    Unsupervised Induction of Frame-Based Linguistic Forms

    Get PDF
    This thesis studies the use of bulk, structured, linguistic annotations in order to perform unsupervised induction of meaning for three kinds of linguistic forms: words, sentences, and documents. The primary linguistic annotation I consider throughout this thesis are frames, which encode core linguistic, background or societal knowledge necessary to understand abstract concepts and real-world situations. I begin with an overview of linguistically-based structured meaning representation; I then analyze available large-scale natural language processing (NLP) and linguistic resources and corpora for their abilities to accommodate bulk, automatically-obtained frame annotations. I then proceed to induce meanings of the different forms, progressing from the word level, to the sentence level, and finally to the document level. I first show how to use these bulk annotations in order to better encode linguistic- and cognitive science backed semantic expectations within word forms. I then demonstrate a straightforward approach for learning large lexicalized and refined syntactic fragments, which encode and memoize commonly used phrases and linguistic constructions. Next, I consider two unsupervised models for document and discourse understanding; one is a purely generative approach that naturally accommodates layer annotations and is the first to capture and unify a complete frame hierarchy. The other conditions on limited amounts of external annotations, imputing missing values when necessary, and can more readily scale to large corpora. These discourse models help improve document understanding and type-level understanding

    Object-oriented data mining

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Semantic Domains in Akkadian Text

    Get PDF
    The article examines the possibilities offered by language technology for analyzing semantic fields in Akkadian. The corpus of data for our research group is the existing electronic corpora, Open richly annotated cuneiform corpus (ORACC). In addition to more traditional Assyriological methods, the article explores two language technological methods: Pointwise mutual information (PMI) and Word2vec.Peer reviewe

    CyberResearch on the Ancient Near East and Eastern Mediterranean

    Get PDF
    CyberResearch on the Ancient Near East and Neighboring Regions provides case studies on archaeology, objects, cuneiform texts, and online publishing, digital archiving, and preservation. Eleven chapters present a rich array of material, spanning the fifth through the first millennium BCE, from Anatolia, the Levant, Mesopotamia, and Iran. Customized cyber- and general glossaries support readers who lack either a technical background or familiarity with the ancient cultures. Edited by Vanessa Bigot Juloux, Amy Rebecca Gansell, and Alessandro Di Ludovico, this volume is dedicated to broadening the understanding and accessibility of digital humanities tools, methodologies, and results to Ancient Near Eastern Studies. Ultimately, this book provides a model for introducing cyber-studies to the mainstream of humanities research

    Combining LAPIS and WordNet for the Learning of LR Parsers with Optimal Semantic Constraints

    No full text
    There is a history of research focussed on learning of shiftreduce parsers from syntactically annotated corpora by the means of machine learning techniques based on logic. The presence of lexical semantic tags in the treebank has proved useful for the learning of semantic constraints limiting the amount of nondeterminism in the parsers. The grain of the semantic tags used is of direct importance to that task
    corecore