122 research outputs found

    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

    Semantic Representation and Inference for NLP

    Full text link
    Semantic representation and inference is essential for Natural Language Processing (NLP). The state of the art for semantic representation and inference is deep learning, and particularly Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and transformer Self-Attention models. This thesis investigates the use of deep learning for novel semantic representation and inference, and makes contributions in the following three areas: creating training data, improving semantic representations and extending inference learning. In terms of creating training data, we contribute the largest publicly available dataset of real-life factual claims for the purpose of automatic claim verification (MultiFC), and we present a novel inference model composed of multi-scale CNNs with different kernel sizes that learn from external sources to infer fact checking labels. In terms of improving semantic representations, we contribute a novel model that captures non-compositional semantic indicators. By definition, the meaning of a non-compositional phrase cannot be inferred from the individual meanings of its composing words (e.g., hot dog). Motivated by this, we operationalize the compositionality of a phrase contextually by enriching the phrase representation with external word embeddings and knowledge graphs. Finally, in terms of inference learning, we propose a series of novel deep learning architectures that improve inference by using syntactic dependencies, by ensembling role guided attention heads, incorporating gating layers, and concatenating multiple heads in novel and effective ways. This thesis consists of seven publications (five published and two under review).Comment: PhD thesis, the University of Copenhage

    Representation and parsing of multiword expressions

    Get PDF
    This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches

    Current trends

    Get PDF
    Deep parsing is the fundamental process aiming at the representation of the syntactic structure of phrases and sentences. In the traditional methodology this process is based on lexicons and grammars representing roughly properties of words and interactions of words and structures in sentences. Several linguistic frameworks, such as Headdriven Phrase Structure Grammar (HPSG), Lexical Functional Grammar (LFG), Tree Adjoining Grammar (TAG), Combinatory Categorial Grammar (CCG), etc., offer different structures and combining operations for building grammar rules. These already contain mechanisms for expressing properties of Multiword Expressions (MWE), which, however, need improvement in how they account for idiosyncrasies of MWEs on the one hand and their similarities to regular structures on the other hand. This collaborative book constitutes a survey on various attempts at representing and parsing MWEs in the context of linguistic theories and applications

    requirements and use cases

    Get PDF
    In this report, we introduce our initial vision of the Corporate Semantic Web as the next step in the broad field of Semantic Web research. We identify requirements of the corporate environment and gaps between current approaches to tackle problems facing ontology engineering, semantic collaboration, and semantic search. Each of these pillars will yield innovative methods and tools during the project runtime until 2013. Corporate ontology engineering will improve the facilitation of agile ontology engineering to lessen the costs of ontology development and, especially, maintenance. Corporate semantic collaboration focuses the human-centered aspects of knowledge management in corporate contexts. Corporate semantic search is settled on the highest application level of the three research areas and at that point it is a representative for applications working on and with the appropriately represented and delivered background knowledge. We propose an initial layout for an integrative architecture of a Corporate Semantic Web provided by these three core pillars

    A Bigger Fish to Fry:Scaling up the Automatic Understanding of Idiomatic Expressions

    Get PDF
    In this thesis, we are concerned with idiomatic expressions and how to handle them within NLP. Idiomatic expressions are a type of multiword phrase which have a meaning that is not a direct combination of the meaning of its parts, e.g. 'at a crossroads' and 'move the goalposts'.In Part I, we provide a general introduction to idiomatic expressions and an overview of observations regarding idioms based on corpus data. In addition, we discuss existing research on idioms from an NLP perspective, providing an overview of existing tasks, approaches, and datasets. In Part II, we focus on the building of a large idiom corpus, consisting of developing a system for the automatic extraction of potentially idiom expressions and building a large corpus of idiom using crowdsourced annotation. Finally, in Part III, we improve an existing unsupervised classifier and compare it to other existing classifiers. Given the relatively poor performance of this unsupervised classifier, we also develop a supervised deep neural network-based system and find that a model involving two separate modules looking at different information sources yields the best performance, surpassing previous state-of-the-art approaches.In conclusion, this work shows the feasibility of building a large corpus of sense-annotated potentially idiomatic expressions, and the benefits such a corpus provides for further research. It provides the possibility for quick testing of hypotheses about the distribution and usage of idioms, it enables the training of data-hungry machine learning methods for PIE disambiguation systems, and it permits fine-grained, reliable evaluation of such systems

    Eesti keele üldvaldkonna tekstide laia kattuvusega automaatne sündmusanalüüs

    Get PDF
    Seoses tekstide suuremahulise digitaliseerimisega ning digitaalse tekstiloome järjest laiema levikuga on tohutul hulgal loomuliku keele tekste muutunud ja muutumas masinloetavaks. Masinloetavus omab potentsiaali muuta tekstimassiivid inimeste jaoks lihtsamini hallatavaks, nt lubada rakendusi nagu automaatne sisukokkuvõtete tegemine ja tekstide põhjal küsimustele vastamine, ent paraku ei ulatu praegused automaatanalüüsi võimalused tekstide sisu tegeliku mõistmiseni. Oletatakse, tekstide sisu mõistvale automaatanalüüsile viib meid lähemale sündmusanalüüs – kuna paljud tekstid on narratiivse ülesehitusega, tõlgendatavad kui „sündmuste kirjeldused”, peaks tekstidest sündmuste eraldamine ja formaalsel kujul esitamine pakkuma alust mitmete „teksti mõistmist” nõudvate keeletehnoloogia rakenduste loomisel. Käesolevas väitekirjas uuritakse, kuivõrd saab eestikeelsete tekstide sündmusanalüüsi käsitleda kui avatud sündmuste hulka ja üldvaldkonna tekste hõlmavat automaatse lingvistilise analüüsi ülesannet. Probleemile lähenetakse eesti keele automaatanalüüsi kontekstis uudsest, sündmuste ajasemantikale keskenduvast perspektiivist. Töös kohandatakse eesti keelele TimeML märgendusraamistik ja luuakse raamistikule toetuv automaatne ajaväljendite tuvastaja ning ajasemantilise märgendusega (sündmusviidete, ajaväljendite ning ajaseoste märgendusega) tekstikorpus; analüüsitakse korpuse põhjal inimmärgendajate kooskõla sündmusviidete ja ajaseoste määramisel ning lõpuks uuritakse võimalusi ajasemantika-keskse sündmusanalüüsi laiendamiseks geneeriliseks sündmusanalüüsiks sündmust väljendavate keelendite samaviitelisuse lahendamise näitel. Töö pakub suuniseid tekstide ajasemantika ja sündmusstruktuuri märgenduse edasiarendamiseks tulevikus ning töös loodud keeleressurssid võimaldavad nii konkreetsete lõpp-rakenduste (nt automaatne ajaküsimustele vastamine) katsetamist kui ka automaatsete märgendustööriistade edasiarendamist.  Due to massive scale digitalisation processes and a switch from traditional means of written communication to digital written communication, vast amounts of human language texts are becoming machine-readable. Machine-readability holds a potential for easing human effort on searching and organising large text collections, allowing applications such as automatic text summarisation and question answering. However, current tools for automatic text analysis do not reach for text understanding required for making these applications generic. It is hypothesised that automatic analysis of events in texts leads us closer to the goal, as many texts can be interpreted as stories/narratives that are decomposable into events. This thesis explores event analysis as broad-coverage and general domain automatic language analysis problem in Estonian, and provides an investigation starting from time-oriented event analysis and tending towards generic event analysis. We adapt TimeML framework to Estonian, and create an automatic temporal expression tagger and a news corpus manually annotated for temporal semantics (event mentions, temporal expressions, and temporal relations) for the language; we analyse consistency of human annotation of event mentions and temporal relations, and, finally, provide a preliminary study on event coreference resolution in Estonian news. The current work also makes suggestions on how future research can improve Estonian event and temporal semantic annotation, and the language resources developed in this work will allow future experimentation with end-user applications (such as automatic answering of temporal questions) as well as provide a basis for developing automatic semantic analysis tools
    corecore