24 research outputs found

    Predicate Matrix: an interoperable lexical knowledge base for predicates

    Get PDF
    183 p.La Matriz de Predicados (Predicate Matrix en inglés) es un nuevo recurso léxico-semántico resultado de la integración de múltiples fuentes de conocimiento, entre las cuales se encuentran FrameNet, VerbNet, PropBank y WordNet. La Matriz de Predicados proporciona un léxico extenso y robusto que permite mejorar la interoperabilidad entre los recursos semánticos mencionados anteriormente. La creación de la Matriz de Predicados se basa en la integración de Semlink y nuevos mappings obtenidos utilizando métodos automáticos que enlazan el conocimiento semántico a nivel léxico y de roles. Asimismo, hemos ampliado la Predicate Matrix para cubrir los predicados nominales (inglés, español) y predicados en otros idiomas (castellano, catalán y vasco). Como resultado, la Matriz de predicados proporciona un léxico multilingüe que permite el análisis semántico interoperable en múltiples idiomas

    Proceedings

    Get PDF
    Proceedings of the Workshop on Annotation and Exploitation of Parallel Corpora AEPC 2010. Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk. NEALT Proceedings Series, Vol. 10 (2010), 98 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15893

    Joint learning of syntactic and semantic dependencies

    Get PDF
    In this master’s thesis we designed, implemented and evaluated a novel joint syntactic and semantic parsing model

    Re-evaluating and Exploring the Contributions of Constituency Grammar to Semantic Role Labeling.

    Full text link
    Since the seminal work of Gildea and Jurafsky (2000), semantic role labeling (SRL) researchers have been trying to determine the appropriate syntactic/semantic knowledge and statistical algorithms to tackle the challenges in SRL. In search of the appropriate knowledge, SRL researchersshifted from constituency grammar to dependency grammar around 2007 due to the suspension in improvement in the systems relying on features based on constituency grammar. However, the results from the CoNLL-2008 SRL systems, all of which utilized dependency grammar-based features, did not support the hypothesis that dependency grammar was more suitable for SRL. Therefore, determining the right syntactic/semantic knowledge for SRL still remains an open question. This entails that finding the right syntactic/semantic knowledge to create features generalizing across the syntactic variations that a verb appears in and involve argument movement or displacement remains a challenge as well. The current dissertation continues the effort to discover the appropriate syntactic/semantic knowledge for SRL. Specifically, while seeking the proper features to solve the SRL problem in general, the present work focuses on tackling the syntactic variation challenge by integrating three types of less thoroughly explored knowledge in constituency grammar-based SRL systems, including context dependence among the semantic roles of core arguments, syntactic structures involving argument movement or displacement, and dependency grammar relations. Integrating such knowledge yields the following novel approach. The system identifies the core and non-core semantic arguments of a verb. To classify non-core arguments, the system uses generic features. For core-arguments, the system relies on the preceding knowledge to extract the base argument configuration (BAC) feature in which the core arguments' positions overlap with those of an argument structure of the verb. BAC features thus generalize across the syntactic variations a verb appears in. Together with the two levels of backoff features dealing with unrealized core arguments and unknown verbs respectively, BAC features effectively solve the argument classification task and handles the preceding challenge. However, the experimental results indicate that the overall performance is affected by the argument identification module. The immediate future work would be to improve argument identification.Ph.D.LinguisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/64814/1/lyshane_1.pd

    Event extraction from biomedical texts using trimmed dependency graphs

    Get PDF
    This thesis explores the automatic extraction of information from biomedical publications. Such techniques are urgently needed because the biosciences are publishing continually increasing numbers of texts. The focus of this work is on events. Information about events is currently manually curated from the literature by biocurators. Biocuration, however, is time-consuming and costly so automatic methods are needed for information extraction from the literature. This thesis is dedicated to modeling, implementing and evaluating an advanced event extraction approach based on the analysis of syntactic dependency graphs. This work presents the event extraction approach proposed and its implementation, the JReX (Jena Relation eXtraction) system. This system was used by the University of Jena (JULIE Lab) team in the "BioNLP 2009 Shared Task on Event Extraction" competition and was ranked second among 24 competing teams. Thereafter JReX was the highest scorer on the worldwide shared U-Compare event extraction server, outperforming the competing systems from the challenge. This success was made possible, among other things, by extensive research on event extraction solutions carried out during this thesis, e.g., exploring the effects of syntactic and semantic processing procedures on solving the event extraction task. The evaluations executed on standard and community-wide accepted competition data were complemented by real-life evaluation of large-scale biomedical database reconstruction. This work showed that considerable parts of manually curated databases can be automatically re-created with the help of the event extraction approach developed. Successful re-creation was possible for parts of RegulonDB, the world's largest database for E. coli. In summary, the event extraction approach justified, developed and implemented in this thesis meets the needs of a large community of human curators and thus helps in the acquisition of new knowledge in the biosciences

    Statistical semantic processing using Markov logic

    Get PDF
    Markov Logic (ML) is a novel approach to Natural Language Processing tasks [Richardson and Domingos, 2006; Riedel, 2008]. It is a Statistical Relational Learning language based on First Order Logic (FOL) and Markov Networks (MN). It allows one to treat a task as structured classification. In this work, we investigate ML for the semantic processing tasks of Spoken Language Understanding (SLU) and Semantic Role Labelling (SRL). Both tasks consist of identifying a semantic representation for the meaning of a given utterance/sentence. However, they differ in nature: SLU is in the field of dialogue systems where the domain is closed and language is spoken [He and Young, 2005], while SRL is for open domains and traditionally for written text [M´arquez et al., 2008]. Robust SLU is a key component of spoken dialogue systems. This component consists of identifying the meaning of the user utterances addressed to the system. Recent statistical approaches to SLU depend on additional resources (e.g., gazetteers, grammars, syntactic treebanks) which are expensive and time-consuming to produce and maintain. On the other hand, simple datasets annotated only with slot-values are commonly used in dialogue system development, and are easy to collect, automatically annotate, and update. However, slot-values leave out some of the fine-grained long distance dependencies present in other semantic representations. In this work we investigate the development of SLU modules with minimum resources with slot-values as their semantic representation. We propose to use the ML to capture long distance dependencies which are not explicitly available in the slot-value semantic representation. We test the adequacy of the ML framework by comparing against a set of baselines using state of the art approaches to semantic processing. The results of this research have been published in Meza-Ruiz et al. [2008a,b]. Furthermore, we address the question of scalability of the ML approach for other NLP tasks involving the identification of semantic representations. In particular, we focus on SRL: the task of identifying predicates and arguments within sentences, together with their semantic roles. The semantic representation built during SRL is more complex than the slot-values used in dialogue systems, in the sense that they include the notion of predicate/argument scope. SRL is defined in the context of open domains under the premises that there are several levels of extra resources (lemmas, POS tags, constituent or dependency parses). In this work, we propose a ML model of SRL and experiment with the different architectures we can describe for the model which gives us an insight into the types of correlations that the ML model can express [Riedel and Meza-Ruiz, 2008; Meza-Ruiz and Riedel, 2009]. Additionally, we tested our minimal resources setup in a state of the art dialogue system: the TownInfo system. In this case, we were given a small dataset of gold standard semantic representations which were system dependent, and we rapidly developed a SLU module used in the functioning dialogue system. No extra resources were necessary in order to reach state of the art results

    Formal Linguistic Models and Knowledge Processing. A Structuralist Approach to Rule-Based Ontology Learning and Population

    Get PDF
    2013 - 2014The main aim of this research is to propose a structuralist approach for knowledge processing by means of ontology learning and population, achieved starting from unstructured and structured texts. The method suggested includes distributional semantic approaches and NL formalization theories, in order to develop a framework, which relies upon deep linguistic analysis... [edited by author]XIII n.s

    Důležitá slova. Podklady ke kolokačnímu švédsko-českému slovníku základních sloves

    Get PDF
    Basic verbs, i.e. very common verbs that typically denote physical movements, locations, states or actions, undergo various semantic shifts and acquire different secondary uses. In extreme cases, the distribution of secondary uses grows so general that they are regarded as auxiliary verbs (go and to be going to), phase verbs (turn, grow), etc. ese uses are usually well-documented by grammars and language textbooks, and so are idiomatic expressions (phraseologisms) in dictionaries. ere is, however, a grey area in between, which is extremely difficult to learn for non-native speakers. is consists of secondary uses with limited collocability, in particular light verb constructions, and secondary meanings that only get activated under particular morphosyntactic conditions. e basic-verb secondary uses and constructions are usually semantically transparent, such that they do not pose understanding problems, but they are generally unpredictable and language-specific, such that they easily become an issue in non-native text production. In this thesis, Swedish basic verbs are approached from the contrastive point of view of an advanced Czech learner of Swedish. A selection of Swedish constructions with basic verbs is explored. e observations result in a proposal for the structure of a machine-readable Swedish-Czech...Základní slovesa (basic verbs), tj. frekventovaná významová slovesa, jež zpravidla popisují fyzický pohyb, umístění, stav, nebo děj, procházejí řadou sémantických posunů, díky kterým se používají k vyjádření druhotných, přenesených významů. V krajních případech se dané sloveso stává pomocným, způsobovým, nebo fázovým slovesem a přestávají pro ně platit kolokační omezení, jež se vztahují na sloveso užité v jeho primárním (tj. doslovném) významu. Tato užití sloves bývají většinou dobře dokumentována v gramatikách i učebnicích, stejně jako kvalitní slovníky podávají podrobnou informaci o užití těchto sloves v ustálených frazeologických spojeních. Mezi plně gramatikalizovaným užitím na jedné straně a idiomatickým, frazeologickým užitím na druhé straně však existuje celá škála užití základních sloves v přenesených významech, jejíž zvládnutí je pro nerodilého mluvčího značně obtížné: užití v přeneseném významu, jež mají omezenou kolokabilitu. To jsou především verbonominální konstrukce někdy nazývané analytické predikáty (light verb constructions), ale také užití, která za určitých omezených morfosyntaktických podmínek (např. pouze v negaci) aktivují abstraktní sémantické rysy u jiných predikátů, např. zesilují význam, nebo implikují, že daný děj již trvá dlouho, a podobně. Tato druhotná užití významových sloves...Institute of Germanic StudiesÚstav germánských studiíFilozofická fakultaFaculty of Art
    corecore