16 research outputs found

    Transition-based combinatory categorial grammar parsing for English and Hindi

    Get PDF
    Given a natural language sentence, parsing is the task of assigning it a grammatical structure, according to the rules within a particular grammar formalism. Different grammar formalisms like Dependency Grammar, Phrase Structure Grammar, Combinatory Categorial Grammar, Tree Adjoining Grammar are explored in the literature for parsing. For example, given a sentence like “John ate an apple”, parsers based on the widely used dependency grammars find grammatical relations, such as that ‘John’ is the subject and ‘apple’ is the object of the action ‘ate’. We mainly focus on Combinatory Categorial Grammar (CCG) in this thesis. In this thesis, we present an incremental algorithm for parsing CCG for two diverse languages: English and Hindi. English is a fixed word order, SVO (Subject-Verb- Object), and morphologically simple language, whereas, Hindi, though predominantly a SOV (Subject-Object-Verb) language, is a free word order and morphologically rich language. Developing an incremental parser for Hindi is really challenging since the predicate needed to resolve dependencies comes at the end. As previously available shift-reduce CCG parsers use English CCGbank derivations which are mostly right branching and non-incremental, we design our algorithm based on the dependencies resolved rather than the derivation. Our novel algorithm builds a dependency graph in parallel to the CCG derivation which is used for revealing the unbuilt structure without backtracking. Though we use dependencies for meaning representation and CCG for parsing, our revealing technique can be applied to other meaning representations like lambda expressions and for non-CCG parsing like phrase structure parsing. Any statistical parser requires three major modules: data, parsing algorithm and learning algorithm. This thesis is broadly divided into three parts each dealing with one major module of the statistical parser. In Part I, we design a novel algorithm for converting dependency treebank to CCGbank. We create Hindi CCGbank with a decent coverage of 96% using this algorithm. We also do a cross-formalism experiment where we show that CCG supertags can improve widely used dependency parsers. We experiment with two popular dependency parsers (Malt and MST) for two diverse languages: English and Hindi. For both languages, CCG categories improve the overall accuracy of both parsers by around 0.3-0.5% in all experiments. For both parsers, we see larger improvements specifically on dependencies at which they are known to be weak: long distance dependencies for Malt, and verbal arguments for MST. The result is particularly interesting in the case of the fast greedy parser (Malt), since improving its accuracy without significantly compromising speed is relevant for large scale applications such as parsing the web. We present a novel algorithm for incremental transition-based CCG parsing for English and Hindi, in Part II. Incremental parsers have potential advantages for applications like language modeling for machine translation and speech recognition. We introduce two new actions in the shift-reduce paradigm for revealing the required information during parsing. We also analyze the impact of a beam and look-ahead for parsing. In general, using a beam and/or look-ahead gives better results than not using them. We also show that the incremental CCG parser is more useful than a non-incremental version for predicting relative sentence complexity. Given a pair of sentences from wikipedia and simple wikipedia, we build a classifier which predicts if one sentence is simpler/complex than the other. We show that features from a CCG parser in general and incremental CCG parser in particular are more useful than a chart-based phrase structure parser both in terms of speed and accuracy. In Part III, we develop the first neural network based training algorithm for parsing CCG. We also study the impact of neural network based tagging models, and greedy versus beam-search parsing, by using a structured neural network model. In greedy settings, neural network models give significantly better results than the perceptron models and are also over three times faster. Using a narrow beam, structured neural network model gives consistently better results than the basic neural network model. For English, structured neural network gives similar performance to structured perceptron parser. But for Hindi, structured perceptron is still the winner

    An Incremental Algorithm for Transition-based CCG Parsing

    Get PDF
    Incremental parsers have potential advantages for applications like language modeling for machine translation and speech recognition. We describe a new algorithm for incremental transition-based Combinatory Categorial Grammar parsing. As English CCGbank derivations are mostly right branching and non-incremental, we design our algorithm based on the dependencies resolved rather than the derivation. We introduce two new actions in the shift-reduce paradigm based on the idea of 'revealing' (Pareschi and Steedman, 1987) the required information during parsing. On the standard CCGbank test data, our algorithm achieved improvements of 0.88% in labeled and 2.0% in unlabeled F-score over a greedy non-incremental shift-reduce parser.11 page(s

    Sähkönlaadun selvitys Ponsse Oyj:n Vieremän tehtaassa

    Get PDF
    Tiivistelmä Opinnäytetyön tarkoituksena oli selvittää Ponsse Oyj Vieremän tuotantotehtaan sähkönlaatua. Edellinen sähkönlaatumittaus oli suoritettu vuonna 2000 ja sen jälkeen tehdas on laajentunut useita kertoja, joten verkon nykytilanne oli syytä selvittää. Osa laitetoimittajista oli halukkaita tietämään tehtaan sähkönlaadun, jotta tietävät minkälaisessa sähköverkossa toimitettavat laitteet tulevat toimimaan. Opinnäytetyössä perehdyttiin aluksi sähkönlaatuun ja siihen vaikuttaviin tekijöihin ja siihen, kuinka kompensointi vaikuttaa verkkoon ja minkälaisia kompensointivaihtoehtoja on olemassa. Käytännön osuudessa perehdyttiin sähkönlaatumittarin toimintaan ja sen liittämiseen mitattavaan kohteeseen. Sähkönlaadunmittauksia tehtaassa suoritettiin useisiin eri keskuksiin, käyttäen Fluke 435 II -verkkoanalysaattoria. Mittausjakson pituus oli neljä tuntia ja mittaukset suoritettiin työvuorojen aikana. Tulosten analysointiin käytettiin apuna Fluken PowerLog 5.2 -ohjelmaa. Näillä mittauksilla verkon todellinen tilanne saatiin selvitettyä. Mittausten ohella päivitettiin nousujohtokaaviot. Mittaustulosten perusteella arvioitiin sähkön laatua liittymispisteessä ja siinä esiintyviä ongelmia. Tässä työssä keskityttiin kahteen erilliseen pääkeskuksen mittaukseen ja tulosten analysointiin. Muista keskuksista tehdyt mittaukset tuloksineen koottiin Ponssen omaan käyttöön jatkoa varten. Laajoilla sähkönlaatumittauksilla saatiin selvitettyä tehtaan verkon tilanne ja paikallistettua mahdolliset ongelmakohdat. Työn tilaajan kanssa on tehty salassapitosopimus, jonka takia opinnäytetyössä sähköverkkoa ja sen laitteita on käsitelty rajallisesti.Abstract The purpose of the thesis was to find out the quality of electricity at Ponsse Plc, a forestry machinery production plant in Vieremä. The previous power quality measurements were carried out in 2000 and since then the plant has been expanded several times so the current state of the network should be investigated. Some of the equipment suppliers were eager to know the power quality of the plant in order to know in what kind of power network the equipment to be supplied will be used. First, the quality of electricity was studied and the factors influencing it, how the compensation affects the network and what kind of compensation options exist. In the practical part, the performance of the electricity meter and its attachment to the target to be measured was familiarized with. The measurements of the quality of electricity at the plant were carried out for a number of different centers using the Fluke 435 II network analyzer. The length of the measurement period was four hours and the measurements were performed during shifts. The Fluke power log 5.2 software was used to analyze the results. With these measurements, the actual situation of the network was clarified. In addition to the measurements, the correctness of the uplink charts was checked and the flaws were corrected. Based on the measurement results, the quality of electricity at the point of entry and the problems encountered there were assessed. This work focused on two separate main centers, their measurement and analysis of the results. The measurements made from the other centers with the results were collected for Ponsse's own use for further use. Because of the confidentiality agreement with the commissioner of the thesis the electricity grid and its equipment were not covered in greater detail
    corecore