14 research outputs found

    Measuring Frame Instance Relatedness

    Get PDF

    Estonian football specific corpora automatic semantic role labeling with football specific Framenet

    Get PDF
    Käesoleva töö eesmärgiks on uurida ning üritada lahendada eestikeelse teksti automaatse freimidega märgendamise probleemi. Üldine eestikeelne Framenet on alles algusjärgus, kuid olemas on terviklik jalgpalli-alane freimide ressurss, mille abil üritame tõestada hüpoteesi, et jalgpalli-alase teksti märgendamiseks piisab vaid morfoloogilisest ning süntaktilisest infost. Sellele hüpoteesile me siiski kinnitust ei saanud, kuna sama tähendust kandvat lauset on võimalik esitada liiga paljudel erinevatel viisidel. Lisaks täiendasime jalgpalli-alaste sõnadega Eesti suurimat leksikaal-semantilist andmebaasi, Wordnetti.Research and a possible solution to the problem of automatic semantic role labeling of text in Estonian is carried out in this paper. A general Estonian Framenet is in the starting phase, but there is also available a football specific Framenet. We try to prove the hypothesis that morphological and syntactical information is enough for automatic semantic role labeling in football related corpora. Unfortunately, we did not achieve a confirmation for the hypothesis, because there are too many ways to present sentences that have the same meaning. In addition, we supplemented Estonian biggest lexical-syntactic database with football related words

    Semantic argument classification and semantic categorization of Turkish existential sentences using support vector learning

    Get PDF
    Cataloged from PDF version of article.There are three types of sentences that form all existing natural languages: verbal sentences (e.g. “I read the book.”), copulative sentences (e.g. “The book is on the table.”), and existential sentences (e.g. “There is a book on the table.”). Syntactic and semantic recognition of these sentence types are crucially important in computational linguistics although there has not been any significant work towards this end. This thesis, in an attempt to fill this evident gap, is on identifying and assigning semantic categories of Turkish existential sentences in print. Existential sentences in Turkish are minimally characterized by the two existential particles var, meaning there is/are, and yok, meaning there is/are no. In addition to these most basic meanings, other senses of existential particles are possible, which can be categorized into groups such as case existentials and possession existentials. Our system does shallow semantic parsing in defining the predicate-argument relationships in an existential sentence on a word-byword basis, via utilizing Support Vector Machines, after which it proceeds with the semantic categorization of the whole sentence. For both of these tasks, our system produces promising results, in terms of accuracy and precision/recall, respectively. Part of this research contributes to the annotation of the METU-Sabancı Turkish Treebank with semantic information.Koca, AylinM.S

    Doctor of Philosophy

    Get PDF
    dissertationEvents are one important type of information throughout text. Event extraction is an information extraction (IE) task that involves identifying entities and objects (mainly noun phrases) that represent important roles in events of a particular type. However, the extraction performance of current event extraction systems is limited because they mainly consider local context (mostly isolated sentences) when making each extraction decision. My research aims to improve both coverage and accuracy of event extraction performance by explicitly identifying event contexts before extracting individual facts. First, I introduce new event extraction architectures that incorporate discourse information across a document to seek out and validate pieces of event descriptions within the document. TIER is a multilayered event extraction architecture that performs text analysis at multiple granularities to progressively \zoom in" on relevant event information. LINKER is a unied discourse-guided approach that includes a structured sentence classier to sequentially read a story and determine which sentences contain event information based on both the local and preceding contexts. Experimental results on two distinct event domains show that compared to previous event extraction systems, TIER can nd more event information while maintaining a good extraction accuracy, and LINKER can further improve extraction accuracy. Finding documents that describe a specic type of event is also highly challenging because of the wide variety and ambiguity of event expressions. In this dissertation, I present the multifaceted event recognition approach that uses event dening characteristics (facets), in addition to event expressions, to eectively resolve the complexity of event descriptions. I also present a novel bootstrapping algorithm to automatically learn event expressions as well as facets of events, which requires minimal human supervision. Experimental results show that the multifaceted event recognition approach can eectively identify documents that describe a particular type of event and make event extraction systems more precise

    Concept Mining: A Conceptual Understanding based Approach

    Get PDF
    Due to the daily rapid growth of the information, there are considerable needs to extract and discover valuable knowledge from data sources such as the World Wide Web. Most of the common techniques in text mining are based on the statistical analysis of a term either word or phrase. These techniques consider documents as bags of words and pay no attention to the meanings of the document content. In addition, statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Therefore, there is an intensive need for a model that captures the meaning of linguistic utterances in a formal structure. The underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document. A new concept-based model that analyzes terms on the sentence, document and corpus levels rather than the traditional analysis of document only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of concept-based statistical analyzer, conceptual ontological graph representation, concept extractor and concept-based similarity measure. The term which contributes to the sentence semantics is assigned two different weights by the concept-based statistical analyzer and the conceptual ontological graph representation. These two weights are combined into a new weight. The concepts that have maximum combined weights are selected by the concept extractor. The similarity between documents is calculated based on a new concept-based similarity measure. The proposed similarity measure takes full advantage of using the concept analysis measures on the sentence, document, and corpus levels in calculating the similarity between documents. Large sets of experiments using the proposed concept-based model on different datasets in text clustering, categorization and retrieval are conducted. The experiments demonstrate extensive comparison between traditional weighting and the concept-based weighting obtained by the concept-based model. Experimental results in text clustering, categorization and retrieval demonstrate the substantial enhancement of the quality using: (1) concept-based term frequency (tf), (2) conceptual term frequency (ctf), (3) concept-based statistical analyzer, (4) conceptual ontological graph, (5) concept-based combined model. In text clustering, the evaluation of results is relied on two quality measures, the F-Measure and the Entropy. In text categorization, the evaluation of results is relied on three quality measures, the Micro-averaged F1, the Macro-averaged F1 and the Error rate. In text retrieval, the evaluation of results relies on three quality measures, the precision at 10 documents retrieved P(10), the preference measure (bpref), and the mean uninterpolated average precision (MAP). All of these quality measures are improved when the newly developed concept-based model is used to enhance the quality of the text clustering, categorization and retrieval

    Joint learning of syntactic and semantic dependencies

    Get PDF
    In this master’s thesis we designed, implemented and evaluated a novel joint syntactic and semantic parsing model
    corecore