2,684 research outputs found

    Automatic Accuracy Prediction for AMR Parsing

    Full text link
    Abstract Meaning Representation (AMR) represents sentences as directed, acyclic and rooted graphs, aiming at capturing their meaning in a machine readable format. AMR parsing converts natural language sentences into such graphs. However, evaluating a parser on new data by means of comparison to manually created AMR graphs is very costly. Also, we would like to be able to detect parses of questionable quality, or preferring results of alternative systems by selecting the ones for which we can assess good quality. We propose AMR accuracy prediction as the task of predicting several metrics of correctness for an automatically generated AMR parse - in absence of the corresponding gold parse. We develop a neural end-to-end multi-output regression model and perform three case studies: firstly, we evaluate the model's capacity of predicting AMR parse accuracies and test whether it can reliably assign high scores to gold parses. Secondly, we perform parse selection based on predicted parse accuracies of candidate parses from alternative systems, with the aim of improving overall results. Finally, we predict system ranks for submissions from two AMR shared tasks on the basis of their predicted parse accuracy averages. All experiments are carried out across two different domains and show that our method is effective.Comment: accepted at *SEM 201

    Irish treebanking and parsing: a preliminary evaluation

    Get PDF
    Language resources are essential for linguistic research and the development of NLP applications. Low- density languages, such as Irish, therefore lack significant research in this area. This paper describes the early stages in the development of new language resources for Irish – namely the first Irish dependency treebank and the first Irish statistical dependency parser. We present the methodology behind building our new treebank and the steps we take to leverage upon the few existing resources. We discuss language specific choices made when defining our dependency labelling scheme, and describe interesting Irish language characteristics such as prepositional attachment, copula and clefting. We manually develop a small treebank of 300 sentences based on an existing POS-tagged corpus and report an inter-annotator agreement of 0.7902. We train MaltParser to achieve preliminary parsing results for Irish and describe a bootstrapping approach for further stages of development

    General methods for fine-grained morphological and syntactic disambiguation

    Get PDF
    We present methods for improved handling of morphologically rich languages (MRLS) where we define MRLS as languages that are morphologically more complex than English. Standard algorithms for language modeling, tagging and parsing have problems with the productive nature of such languages. Consider for example the possible forms of a typical English verb like work that generally has four four different forms: work, works, working and worked. Its Spanish counterpart trabajar has 6 different forms in present tense: trabajo, trabajas, trabaja, trabajamos, trabajáis and trabajan and more than 50 different forms when including the different tenses, moods (indicative, subjunctive and imperative) and participles. Such a high number of forms leads to sparsity issues: In a recent Wikipedia dump of more than 400 million tokens we find that 20 of these forms occur only twice or less and that 10 forms do not occur at all. This means that even if we only need unlabeled data to estimate a model and even when looking at a relatively common and frequent verb, we do not have enough data to make reasonable estimates for some of its forms. However, if we decompose an unseen form such as trabajaréis `you will work', we find that it is trabajar in future tense and second person plural. This allows us to make the predictions that are needed to decide on the grammaticality (language modeling) or syntax (tagging and parsing) of a sentence. In the first part of this thesis, we develop a morphological language model. A language model estimates the grammaticality and coherence of a sentence. Most language models used today are word-based n-gram models, which means that they estimate the transitional probability of a word following a history, the sequence of the (n - 1) preceding words. The probabilities are estimated from the frequencies of the history and the history followed by the target word in a huge text corpus. If either of the sequences is unseen, the length of the history has to be reduced. This leads to a less accurate estimate as less context is taken into account. Our morphological language model estimates an additional probability from the morphological classes of the words. These classes are built automatically by extracting morphological features from the word forms. To this end, we use unsupervised segmentation algorithms to find the suffixes of word forms. Such an algorithm might for example segment trabajaréis into trabaja and réis and we can then estimate the properties of trabajaréis from other word forms with the same or similar morphological properties. The data-driven nature of the segmentation algorithms allows them to not only find inflectional suffixes (such as -réis), but also more derivational phenomena such as the head nouns of compounds or even endings such as -tec, which identify technology oriented companies such as Vortec, Memotec and Portec and would not be regarded as a morphological suffix by traditional linguistics. Additionally, we extract shape features such as if a form contains digits or capital characters. This is important because many rare or unseen forms are proper names or numbers and often do not have meaningful suffixes. Our class-based morphological model is then interpolated with a word-based model to combine the generalization capabilities of the first and the high accuracy in case of sufficient data of the second. We evaluate our model across 21 European languages and find improvements between 3% and 11% in perplexity, a standard language modeling evaluation measure. Improvements are highest for languages with more productive and complex morphology such as Finnish and Estonian, but also visible for languages with a relatively simple morphology such as English and Dutch. We conclude that a morphological component yields consistent improvements for all the tested languages and argue that it should be part of every language model. Dependency trees represent the syntactic structure of a sentence by attaching each word to its syntactic head, the word it is directly modifying. Dependency parsing is usually tackled using heavily lexicalized (word-based) models and a thorough morphological preprocessing is important for optimal performance, especially for MRLS. We investigate if the lack of morphological features can be compensated by features induced using hidden Markov models with latent annotations (HMM-LAs) and find this to be the case for German. HMM-LAs were proposed as a method to increase part-of-speech tagging accuracy. The model splits the observed part-of-speech tags (such as verb and noun) into subtags. An expectation maximization algorithm is then used to fit the subtags to different roles. A verb tag for example might be split into an auxiliary verb and a full verb subtag. Such a split is usually beneficial because these two verb classes have different contexts. That is, a full verb might follow an auxiliary verb, but usually not another full verb. For German and English, we find that our model leads to consistent improvements over a parser not using subtag features. Looking at the labeled attachment score (LAS), the number of words correctly attached to their head, we observe an improvement from 90.34 to 90.75 for English and from 87.92 to 88.24 for German. For German, we additionally find that our model achieves almost the same performance (88.24) as a model using tags annotated by a supervised morphological tagger (LAS of 88.35). We also find that the German latent tags correlate with morphology. Articles for example are split by their grammatical case. We also investigate the part-of-speech tagging accuracies of models using the traditional treebank tagset and models using induced tagsets of the same size and find that the latter outperform the former, but are in turn outperformed by a discriminative tagger. Furthermore, we present a method for fast and accurate morphological tagging. While part-of-speech tagging annotates tokens in context with their respective word categories, morphological tagging produces a complete annotation containing all the relevant inflectional features such as case, gender and tense. A complete reading is represented as a single tag. As a reading might consist of several morphological features the resulting tagset usually contains hundreds or even thousands of tags. This is an issue for many decoding algorithms such as Viterbi which have runtimes depending quadratically on the number of tags. In the case of morphological tagging, the problem can be avoided by using a morphological analyzer. A morphological analyzer is a manually created finite-state transducer that produces the possible morphological readings of a word form. This analyzer can be used to prune the tagging lattice and to allow for the application of standard sequence labeling algorithms. The downside of this approach is that such an analyzer is not available for every language or might not have the coverage required for the task. Additionally, the output tags of some analyzers are not compatible with the annotations of the treebanks, which might require some manual mapping of the different annotations or even to reduce the complexity of the annotation. To avoid this problem we propose to use the posterior probabilities of a conditional random field (CRF) lattice to prune the space of possible taggings. At the zero-order level the posterior probabilities of a token can be calculated independently from the other tokens of a sentence. The necessary computations can thus be performed in linear time. The features available to the model at this time are similar to the features used by a morphological analyzer (essentially the word form and features based on it), but also include the immediate lexical context. As the ambiguity of word types varies substantially, we just fix the average number of readings after pruning by dynamically estimating a probability threshold. Once we obtain the pruned lattice, we can add tag transitions and convert it into a first-order lattice. The quadratic forward-backward computations are now executed on the remaining plausible readings and thus efficient. We can now continue pruning and extending the lattice order at a relatively low additional runtime cost (depending on the pruning thresholds). The training of the model can be implemented efficiently by applying stochastic gradient descent (SGD). The CRF gradient can be calculated from a lattice of any order as long as the correct reading is still in the lattice. During training, we thus run the lattice pruning until we either reach the maximal order or until the correct reading is pruned. If the reading is pruned we perform the gradient update with the highest order lattice still containing the reading. This approach is similar to early updating in the structured perceptron literature and forces the model to learn how to keep the correct readings in the lower order lattices. In practice, we observe a high number of lower updates during the first training epoch and almost exclusively higher order updates during later epochs. We evaluate our CRF tagger on six languages with different morphological properties. We find that for languages with a high word form ambiguity such as German, the pruning results in a moderate drop in tagging accuracy while for languages with less ambiguity such as Spanish and Hungarian the loss due to pruning is negligible. However, our pruning strategy allows us to train higher order models (order > 1), which give substantial improvements for all languages and also outperform unpruned first-order models. That is, the model might lose some of the correct readings during pruning, but is also able to solve more of the harder cases that require more context. We also find our model to substantially and significantly outperform a number of frequently used taggers such as Morfette and SVMTool. Based on our morphological tagger we develop a simple method to increase the performance of a state-of-the-art constituency parser. A constituency tree describes the syntactic properties of a sentence by assigning spans of text to a hierarchical bracket structure. developed a language-independent approach for the automatic annotation of accurate and compact grammars. Their implementation -- known as the Berkeley parser -- gives state-of-the-art results for many languages such as English and German. For some MRLS such as Basque and Korean, however, the parser gives unsatisfactory results because of its simple unknown word model. This model maps unknown words to a small number of signatures (similar to our morphological classes). These signatures do not seem expressive enough for many of the subtle distinctions made during parsing. We propose to replace rare words by the morphological reading generated by our tagger instead. The motivation is twofold. First, our tagger has access to a number of lexical and sublexical features not available during parsing. Second, we expect the morphological readings to contain most of the information required to make the correct parsing decision even though we know that things such as the correct attachment of prepositional phrases might require some notion of lexical semantics. In experiments on the SPMRL 2013 dataset of nine MRLS we find our method to give improvements for all languages except French for which we observe a minor drop in the Parseval score of 0.06. For Hebrew, Hungarian and Basque we find substantial absolute improvements of 5.65, 11.87 and 15.16, respectively. We also performed an extensive evaluation on the utility of word representations for morphological tagging. Our goal was to reduce the drop in performance that is caused when a model trained on a specific domain is applied to some other domain. This problem is usually addressed by domain adaption (DA). DA adapts a model towards a specific domain using a small amount of labeled or a huge amount of unlabeled data from that domain. However, this procedure requires us to train a model for every target domain. Instead we are trying to build a robust system that is trained on domain-specific labeled and domain-independent or general unlabeled data. We believe word representations to be key in the development of such models because they allow us to leverage unlabeled data efficiently. We compare data-driven representations to manually created morphological analyzers. We understand data-driven representations as models that cluster word forms or map them to a vectorial representation. Examples heavily used in the literature include Brown clusters, Singular Value Decompositions of count vectors and neural-network-based embeddings. We create a test suite of six languages consisting of in-domain and out-of-domain test sets. To this end we converted annotations for Spanish and Czech and annotated the German part of the Smultron treebank with a morphological layer. In our experiments on these data sets we find Brown clusters to outperform the other data-driven representations. Regarding the comparison with morphological analyzers, we find Brown clusters to give slightly better performance in part-of-speech tagging, but to be substantially outperformed in morphological tagging

    A Survey of Paraphrasing and Textual Entailment Methods

    Full text link
    Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

    Parsing and Evaluation. Improving Dependency Grammars Accuracy. Anàlisi Sintàctica Automàtica i Avaluació. Millora de qualitat per a Gramàtiques de Dependències

    Get PDF
    Because parsers are still limited in analysing specific ambiguous constructions, the research presented in this thesis mainly aims to contribute to the improvement of parsing performance when it has knowledge integrated in order to deal with ambiguous linguistic phenomena. More precisely, this thesis intends to provide empirical solutions to the disambiguation of prepositional phrase attachment and argument recognition in order to assist parsers in generating a more accurate syntactic analysis. The disambiguation of these two highly ambiguous linguistic phenomena by the integration of knowledge about the language necessarily relies on linguistic and statistical strategies for knowledge acquisition. The starting point of this research proposal is the development of a rule-based grammar for Spanish and for Catalan following the theoretical basis of Dependency Grammar (Tesnière, 1959; Mel’čuk, 1988) in order to carry out two experiments about the integration of automatically- acquired knowledge. In order to build two robust grammars that understand a sentence, the FreeLing pipeline (Padró et al., 2010) has been used as a framework. On the other hand, an eclectic repertoire of criteria about the nature of syntactic heads is proposed by reviewing the postulates of Generative Grammar (Chomsky, 1981; Bonet and Solà, 1986; Haegeman, 1991) and Dependency Grammar (Tesnière, 1959; Mel’čuk, 1988). Furthermore, a set of dependency relations is provided and mapped to Universal Dependencies (Mcdonald et al., 2013). Furthermore, an empirical evaluation method has been designed in order to carry out both a quantitative and a qualitative analysis. In particular, the dependency parsed trees generated by the grammars are compared to real linguistic data. The quantitative evaluation is based on the Spanish Tibidabo Treebank (Marimon et al., 2014), which is large enough to carry out a real analysis of the grammars performance and which has been annotated with the same formalism as the grammars, syntactic dependencies. Since the criteria between both resources are differ- ent, a process of harmonization has been applied developing a set of rules that automatically adapt the criteria of the corpus to the grammar criteria. With regard to qualitative evaluation, there are no available resources to evaluate Spanish and Catalan dependency grammars quali- tatively. For this reason, a test suite of syntactic phenomena about structure and word order has been built. In order to create a representative repertoire of the languages observed, descriptive grammars (Bosque and Demonte, 1999; Solà et al., 2002) and the SenSem Corpus (Vázquez and Fernández-Montraveta, 2015) have been used for capturing relevant structures and word order patterns, respectively. Thanks to these two tools, two experiments have been carried out in order to prove that knowl- edge integration improves the parsing accuracy. On the one hand, the automatic learning of lan- guage models has been explored by means of statistical methods in order to disambiguate PP- attachment. More precisely, a model has been learned with a supervised classifier using Weka (Witten and Frank, 2005). Furthermore, an unsupervised model based on word embeddings has been applied (Mikolov et al., 2013a,b). The results of the experiment show that the supervised method is limited in predicting solutions for unseen data, which is resolved by the unsupervised method since provides a solution for any case. However, the unsupervised method is limited if it Parsing and Evaluation Improving Dependency Grammars Accuracy only learns from lexical data. For this reason, training data needs to be enriched with the lexical value of the preposition, as well as semantic and syntactic features. In addition, the number of patterns used to learn language models has to be extended in order to have an impact on the grammars. On the other hand, another experiment is carried out in order to improve the argument recog- nition in the grammars by the acquisition of linguistic knowledge. In this experiment, knowledge is acquired automatically from the extraction of verb subcategorization frames from the SenSem Corpus (Vázquez and Fernández-Montraveta, 2015) which contains the verb predicate and its arguments annotated syntactically. As a result of the information extracted, subcategorization frames have been classified into subcategorization classes regarding the patterns observed in the corpus. The results of the subcategorization classes integration in the grammars prove that this information increases the accuracy of the argument recognition in the grammars. The results of the research of this thesis show that grammars’ rules on their own are not ex- pressive enough to resolve complex ambiguities. However, the integration of knowledge about these ambiguities in the grammars may be decisive in the disambiguation. On the one hand, sta- tistical knowledge about PP-attachment can improve the grammars accuracy, but syntactic and semantic information, and new patterns of PP-attachment need to be included in the language models in order to contribute to disambiguate this phenomenon. On the other hand, linguistic knowledge about verb subcategorization acquired from annotated linguistic resources show a positive influence positively on grammars’ accuracy.Aquesta tesi vol tractar les limitacions amb què es troben els analitzadors sintàctics automàtics actualment. Tot i els progressos que s’han fet en l’àrea del Processament del Llenguatge Nat- ural en els darrers anys, les tecnologies del llenguatge i, en particular, els analitzadors sintàc- tics automàtics no han pogut traspassar el llindar de certes ambiguïtats estructurals com ara l’agrupació del sintagma preposicional i el reconeixement d’arguments. És per aquest motiu que la recerca duta a terme en aquesta tesi té com a objectiu aportar millores signiflcatives de quali- tat a l’anàlisi sintàctica automàtica per mitjà de la integració de coneixement lingüístic i estadístic per desambiguar construccions sintàctiques ambigües. El punt de partida de la recerca ha estat el desenvolupament de d’una gramàtica en espanyol i una altra en català basades en regles que segueixen els postulats de la Gramàtica de Dependèn- dencies (Tesnière, 1959; Mel’čuk, 1988) per tal de dur a terme els experiments sobre l’adquisició de coneixement automàtic. Per tal de crear dues gramàtiques robustes que analitzin i entenguin l’oració en profunditat, ens hem basat en l’arquitectura de FreeLing (Padró et al., 2010), una lli- breria de Processament de Llenguatge Natural que proveeix una anàlisi lingüística automàtica de l’oració. Per una altra banda, s’ha elaborat una proposta eclèctica de criteris lingüístics per determinar la formació dels sintagmes i les clàusules a la gramàtica per mitjà de la revisió de les propostes teòriques de la Gramàtica Generativa (Chomsky, 1981; Bonet and Solà, 1986; Haege- man, 1991) i de la Gramàtica de Dependències (Tesnière, 1959; Mel’čuk, 1988). Aquesta proposta s’acompanya d’un llistat de les etiquetes de relació de dependència que fan servir les regles de les gramàtques. A més a més de l’elaboració d’aquest llistat, s’han establert les correspondències amb l’estàndard d’anotació de les Dependències Universals (Mcdonald et al., 2013). Alhora, s’ha dissenyat un sistema d’avaluació empíric que té en compte l’anàlisi quantitativa i qualitativa per tal de fer una valoració completa dels resultats dels experiments. Precisament, es tracta una tasca empírica pel fet que es comparen les anàlisis generades per les gramàtiques amb dades reals de la llengua. Per tal de dur a terme l’avaluació des d’una perspectiva quan- titativa, s’ha fet servir el corpus Tibidabo en espanyol (Marimon et al., 2014) disponible només en espanyol que és prou extens per construir una anàlisi real de les gramàtiques i que ha estat anotat amb el mateix formalisme que les gramàtiques. En concret, per tal com els criteris de les gramàtiques i del corpus no són coincidents, s’ha dut a terme un procés d’harmonització de cri- teris per mitjà d’unes regles creades manualment que adapten automàticament l’estructura i la relació de dependència del corpus al criteri de les gramàtiques. Pel que fa a l’avaluació qualitativa, pel fet que no hi ha recursos disponibles en espanyol i català, hem dissenyat un reprertori de test de fenòmens sintàctics estructurals i relacionats amb l’ordre de l’oració. Amb l’objectiu de crear un repertori representatiu de les llengües estudiades, s’han fet servir gramàtiques descriptives per fornir el repertori d’estructures sintàctiques (Bosque and Demonte, 1999; Solà et al., 2002) i el Corpus SenSem (Vázquez and Fernández-Montraveta, 2015) per capturar automàticament l’ordre oracional. Gràcies a aquestes dues eines, s’han pogut dur a terme dos experiments per provar que la integració de coneixement en l’anàlisi sintàctica automàtica en millora la qualitat. D’una banda, Parsing and Evaluation Improving Dependency Grammars Accuracy s’ha explorat l’aprenentatge de models de llenguatge per mitjà de models estadístics per tal de proposar solucions a l’agrupació del sintagma preposicional. Més concretament, s’ha desen- volupat un model de llenguatge per mitjà d’un classiflcador d’aprenentatge supervisat de Weka (Witten and Frank, 2005). A més a més, s’ha après un model de llenguatge per mitjà d’un mètode no supervisat basat en l’aproximació distribucional anomenat word embeddings (Mikolov et al., 2013a,b). Els resultats de l’experiment posen de manifest que el mètode supervisat té greus lim- itacions per fer donar una resposta en dades que no ha vist prèviament, cosa que és superada pel mètode no supervisat pel fet que és capaç de classiflcar qualsevol cas. De tota manera, el mètode no supervisat que s’ha estudiat és limitat si aprèn a partir de dades lèxiques. Per aquesta raó, és necessari que les dades utilitzades per entrenar el model continguin el valor de la preposi- ció, trets sintàctics i semàntics. A més a més, cal ampliar el número de patrons apresos per tal d’ampliar la cobertura dels models i tenir un impacte en els resultats de les gramàtiques. D’una altra banda, s’ha proposat una manera de millorar el reconeixement d’arguments a les gramàtiques per mitjà de l’adquisició de coneixement lingüístic. En aquest experiment, s’ha op- tat per extreure automàticament el coneixement en forma de classes de subcategorització verbal d’el Corpus SenSem (Vázquez and Fernández-Montraveta, 2015), que conté anotats sintàctica- ment el predicat verbal i els seus arguments. A partir de la informació extreta, s’ha classiflcat les diverses diàtesis verbals en classes de subcategorització verbal en funció dels patrons observats en el corpus. Els resultats de la integració de les classes de subcategorització a les gramàtiques mostren que aquesta informació determina positivament el reconeixement dels arguments. Els resultats de la recerca duta a terme en aquesta tesi doctoral posen de manifest que les regles de les gramàtiques no són prou expressives per elles mateixes per resoldre ambigüitats complexes del llenguatge. No obstant això, la integració de coneixement sobre aquestes am- bigüitats pot ser decisiu a l’hora de proposar una solució. D’una banda, el coneixement estadístic sobre l’agrupació del sintagma preposicional pot millorar la qualitat de les gramàtiques, però per aflrmar-ho cal incloure informació sintàctica i semàntica en els models d’aprenentatge automàtic i capturar més patrons per contribuir en la desambiguació de fenòmens complexos. D’una al- tra banda, el coneixement lingüístic sobre subcategorització verbal adquirit de recursos lingüís- tics anotats influeix decisivament en la qualitat de les gramàtiques per a l’anàlisi sintàctica au- tomàtica

    Analysing Errors of Open Information Extraction Systems

    Full text link
    We report results on benchmarking Open Information Extraction (OIE) systems using RelVis, a toolkit for benchmarking Open Information Extraction systems. Our comprehensive benchmark contains three data sets from the news domain and one data set from Wikipedia with overall 4522 labeled sentences and 11243 binary or n-ary OIE relations. In our analysis on these data sets we compared the performance of four popular OIE systems, ClausIE, OpenIE 4.2, Stanford OpenIE and PredPatt. In addition, we evaluated the impact of five common error classes on a subset of 749 n-ary tuples. From our deep analysis we unreveal important research directions for a next generation of OIE systems.Comment: Accepted at Building Linguistically Generalizable NLP Systems at EMNLP 201

    A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-of-Speech Tagging

    Full text link
    In this paper, we propose a new approach to construct a system of transformation rules for the Part-of-Speech (POS) tagging task. Our approach is based on an incremental knowledge acquisition method where rules are stored in an exception structure and new rules are only added to correct the errors of existing rules; thus allowing systematic control of the interaction between the rules. Experimental results on 13 languages show that our approach is fast in terms of training time and tagging speed. Furthermore, our approach obtains very competitive accuracy in comparison to state-of-the-art POS and morphological taggers.Comment: Version 1: 13 pages. Version 2: Submitted to AI Communications - the European Journal on Artificial Intelligence. Version 3: Resubmitted after major revisions. Version 4: Resubmitted after minor revisions. Version 5: to appear in AI Communications (accepted for publication on 3/12/2015
    corecore