162 research outputs found
Complexity of Lexical Descriptions and its Relevance to Partial Parsing
In this dissertation, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated with rich descriptions (supertags) that impose complex constraints in a local context. However, increasing the complexity of descriptions makes the number of different descriptions for each lexical item much larger and hence increases the local ambiguity for a parser. This local ambiguity can be resolved by using supertag co-occurrence statistics collected from parsed corpora. We have explored these ideas in the context of Lexicalized Tree-Adjoining Grammar (LTAG) framework wherein supertag disambiguation provides a representation that is an almost parse. We have used the disambiguated supertag sequence in conjunction with a lightweight dependency analyzer to compute noun groups, verb groups, dependency linkages and even partial parses. We have shown that a trigram-based supertagger achieves an accuracy of 92.1â° on Wall Street Journal (WSJ) texts. Furthermore, we have shown that the lightweight dependency analysis on the output of the supertagger identifies 83â° of the dependency links accurately. We have exploited the representation of supertags with Explanation-Based Learning to improve parsing effciency. In this approach, parsing in limited domains can be modeled as a Finite-State Transduction. We have implemented such a system for the ATIS domain which improves parsing eciency by a factor of 15. We have used the supertagger in a variety of applications to provide lexical descriptions at an appropriate granularity. In an information retrieval application, we show that the supertag based system performs at higher levels of precision compared to a system based on part-of-speech tags. In an information extraction task, supertags are used in specifying extraction patterns. For language modeling applications, we view supertags as syntactically motivated class labels in a class-based language model. The distinction between recursive and non-recursive supertags is exploited in a sentence simplification application
Extracting Temporal and Causal Relations between Events
Structured information resulting from temporal information processing is
crucial for a variety of natural language processing tasks, for instance to
generate timeline summarization of events from news documents, or to answer
temporal/causal-related questions about some events. In this thesis we present
a framework for an integrated temporal and causal relation extraction system.
We first develop a robust extraction component for each type of relations, i.e.
temporal order and causality. We then combine the two extraction components
into an integrated relation extraction system, CATENA---CAusal and Temporal
relation Extraction from NAtural language texts---, by utilizing the
presumption about event precedence in causality, that causing events must
happened BEFORE resulting events. Several resources and techniques to improve
our relation extraction systems are also discussed, including word embeddings
and training data expansion. Finally, we report our adaptation efforts of
temporal information processing for languages other than English, namely
Italian and Indonesian.Comment: PhD Thesi
Recognition and normalization of temporal expressions in Serbian medical narratives
The temporal dimension emerges as one of the essential concepts in the field of medicine, providing a basis for the proper interpretation and understanding of medically relevant information, often recorded only in unstructured texts. Automatic processing of temporal expressions involves their identification and formalization in a language understandable to computers. This paper aims to apply the existing system for automatic processing of temporal expressions in Serbian natural language texts to medical narrative texts, to evaluate the systemâs efficiency in recognition and normalization of temporal expressions and to determine the degree of necessary adaptation according to the characteristics and requirements of the medical domain
List Construction in Finland-Swedish Sign Language
Finland-Swedish Sign Language (FinSSL) is an endangered minority signed language used by approximately 90 deaf and 100 hearing persons in Finland and a smaller group of users in Sweden. Finland-Swedish Sign Language is in need of revitalization, and this study contributes to this with a detailed description of the form and usage of list construction in informational monologues published between 2014 and 2019.
This study examines the use of list constructions in FinSSL. In list construction and its basic form, the non-dominant hand âcountsâ and its fingertips are associated with entities while the dominant hand is used for pointing at these non-dominant handâs fingers. In previous studies, list constructions have been called, for example, digital enumeration, finger(tip) loci or enumeration, and list buoys. List constructions have been described as a simultaneous expression involving the use of a numeral sign, since the non-dominant list hand often borrows its handshape from a corresponding numeral sign. The list hand can be held in place throughout a stretch of discourse (perseverating) or the hand can alternate between perseveration, simultaneous presentation of list fingers, sequential presentation of the list fingers, and various mixed versions of these.
This study focuses on how FinSSL signers use list constructions in informational videos published on Teckeneko (www.teckeneko.fi). Teckeneko is a web-based information channel and broadcast service administered by the association Finlandssvenska teckensprÄkiga rf and the media company Moxio AB.
The data for this study consists of 48 videos (2 hours and 16 minutes) where the list construction was used 241 times by seven different signers. The data was first annotated with the ELAN annotation program, and then the usage-events were analyzed by using Cognitive Grammar as the theoretical framework. In this analysis, list constructions consist of a list hand and a pointing device. The list hand and its fingers represent the different listed entities. The other hand acts as the pointing device and directs attention to the referents on the list hand fingers.
The results of this study are the detailed description of the list construction usage in informational videos signed in FinSSL. The signers were found to use list construction, e.g., in enumerating topics in the video in question or for a project, events, dates, program numbers, participants, and organizations. The signers also used list construction for grouping the enumerated entities and for referring to the group of entities instead of individual entities. The results show a more nuanced understanding of the use of list constructions in FinSSL and in signed languages in general but also a need for further research on list constructions in other types of data.Listakonstruktio suomenruotsalaisessa viittomakielessÀ
TÀmÀ vÀitöstutkimus kÀsittelee listakonstruktiota suomenruotsalaisessa viittomakielessÀ. Suomenruotsalainen viittomakieli on toinen Suomessa kÀytettÀvistÀ viittomakielistÀ. Se on vakavasti uhanalainen kieli, jolla on Suomessa noin sata natiivia kielenkÀyttÀjÀÀ. TÀmÀ on ensimmÀinen vÀitöstutkimus suomenruotsalaisen viittomakielen kieliopista. Tutkimus on deskriptiivinen, ja se on teoreettiselta viitekehykseltÀÀn kognitiivisen kieliopin alalta. Työn teoreettiseen osaan on kerÀtty laajalti kuvauksia listakonstruktiosta ja sen kÀytöstÀ ja tutkimuksesta muista viittomakielistÀ ympÀri maailmaa.
Listakonstruktiossa viittoja ojentaa toisesta kÀdestÀÀn (nk. listakÀsi) yhdestÀ viiteen sormea ja toisella kÀdellÀÀn (nk. osoitin) osoittaa joko yhteen tai useampaan nÀistÀ ojennetuista sormista. Listakonstruktiota kÀytetÀÀn, kun viittoja listaa asioita ja paikantaa listattavat asiat nÀihin listakÀden sormiin. ListakÀden ojennettujen sormien lukumÀÀrÀ riippuu siitÀ, montako asiaa viittojan listalla on. Tarvittavat sormet voidaan ojentaa joko kaikki kerralla (nk. simultaaninen lista) tai yksitellen listan edetessÀ (nk. sekventiaalinen lista). ListakÀsi voi myös olla joko nÀkyvillÀ, listasormet ojennettuina, koko sen ajan, kun listakonstruktiota tuotetaan (nk. pysyvÀ lista), tai listakÀsi voi osallistua listattavien asioiden viittomiseen ja ottaa listamuodon, kun on seuraavan listasormiin tehtÀvÀn osoituksen vuoro.
VÀitöstutkimuksessa kuvataan, kuinka suomenruotsalaista viittomakieltÀ kÀyttÀvÀt hyödyntÀvÀt listakonstruktiota informatiivisissa monologeissa, jotka on julkaistu Teckeneko-sivustolla (teckeneko.fi) ja todetaan, ettÀ kÀyttö on monipuolista ja luovaa. Osoittavan kÀden kÀsimuoto ja sen tekemÀ liike nimittÀin poikkeaa usein prototyyppisestÀ, pelkÀllÀ etusormella tehtÀvÀstÀ osoituksesta ja kosketuksesta yhden listakÀden sormenpÀÀstÀ. Osoittava kÀsimuoto voi sisÀltÀÀ sekÀ etu- ettÀ keskisormet ja tÀten on mahdollista koskettaa kahta listakÀden sormenpÀÀtÀ yhtÀ aikaa ja nÀin viitata kahteen listan kohtaan simultaanisesti. Osoittava kÀsi voi myös melkein suoran liikkeen ja pienen kosketuksen sijaan tehdÀ pyörÀhtÀvÀn liikkeen listakÀden ojennettujen sormien ympÀri tai linjamaisen liikkeen ojennettujen sormenpÀiden yli tai vieressÀ. TÀllÀ pyörÀhtÀvÀllÀ tai linjamaisella liikkeellÀ viittoja viittaa listan asioihin yhtenÀ ryhmÀnÀ, tai jos linjamainen liike ei kosketakaan kaikkia listakÀden ojennettujen sormien sormenpÀitÀ, viittoja voi ryhmitellÀ listan asiat kahdeksi ryhmÀksi: nÀmÀ, joihin koskettiin, ja nuo, joihin ei koskettu.
Listkonstruktionen i finlandssvenskt teckensprÄk
Doktorsavhandlingen behandlar om listkonstruktionen i finlandssvenskt teckensprÄk som Àr ett av de tvÄ teckensprÄken i Finland. Det finlandssvenska teckensprÄket Àr ett allvarligt hotat sprÄk med ungefÀr hundra nativa sprÄkanvÀndare i Finland. Det hÀr Àr den första doktorsavhandlingen som fokuserar det finlandssvenska teckensprÄkets grammatik. Studien Àr deskriptiv och har kognitiv lingvistik som sin teoretiska ram. I avhandlingens teoretiska del har samlats beskrivningar av hur listkonstruktionen Àr beskriven och hur den anvÀnds i flera teckensprÄk runt omkring i vÀrlden.
DÄ en person tecknar en listkonstruktion, visar hen med ena handen (den s.k. listhanden) ett till fem utstrÀckta fingrar och pekar med den andra handen (den s.k. pekhanden) antingen pÄ ett eller flera av listhandens fingrar. Listkonstruktionen anvÀnds dÄ man listar olika saker eller enheter och dessa listenheter placeras pÄ listhandens fingrar. Antalet utstrÀckta listfingrar beror pÄ antalet enheter pÄ listan. Dessa fingrar kan strÀckas ut antigen alla pÄ en gÄng (s.k. simultan lista) eller i tur och ordning dÄ listan framskrider (s.k. sekventiell lista). Listhanden kan ocksÄ hÄllas kvar i listformen under hela den tiden som listkonstruktionen produceras (s.k. permanent lista), eller listhanden kan förlora listformen under den tiden listhanden deltar i tecknandet av de listade sakerna och kan Äteruppta listformen dÄ det Àr dags för följande pekning pÄ listfingrarna.
I den hÀr doktorsavhandlingen beskrivs hur de som tecknar finlandssvenskt teckensprÄk utnyttjar listkonstruktionen i informativa monologer som Àr publicerade pÄ Teckeneko (teckeneko.fi). Studien visar att anvÀndningen Àr mÄngsidig och kreativ. NÀmligen, pekhandens handform och rörelsen den handen gör skiljer sig ofta frÄn den prototypiska pekningen. Den prototypiska pekningen görs med ett pekfinger och rörelsen Àr mot ett listfinger och slutar med kontakt mellan ett av listhandens fingrar och pekhanden. Studien visar att pekhandens handform kan innehÄlla bÄde pek- och mittfingret vilket möjliggör att ha kontakt med tvÄ listfingrar samtidigt och dÀrmed hÀnvisa till tvÄ listpunkter simultant. Pekhanden kan ocksÄ göra en cirkulÀr rörelse runtomkring eller en nÀstan rak linjerörelse över eller nÀra de utstrÀckta listfingrarna i stÀllet för en rörelse till ett finger. Den som tecknar kan med denna cirkulÀra eller linjÀra rörelse hÀnvisa till de listade enheterna i en grupp, eller gruppera enheterna i tvÄ grupper om nÄgot finger lÀmnas utanför den linjÀra rörelsen: de som den pekande handen rörde vid och de som den pekande handen inte rörde vid
Adjectivization in Russian: Analyzing participles by means of lexical frequency and constraint grammar
This dissertation explores the factors that restrict and facilitate adjectivization in Russian, an affixless part-of-speech change leading to ambiguity between participles and adjectives. I develop a theoretical framework based on major approaches to adjectivization, and assess the effect of the factors on ambiguity in the empirical data. I build a linguistic model using the Constraint Grammar formalism. The model utilizes the factors of adjectivization and corpus frequencies as formal constraints for differentiating between participles and adjectives in a disambiguation task.
The main question that is explored in this dissertation is which linguistic factors allow for the differentiation between adjectivized and unambiguous participles. Another question concerns which factors, syntactic or morphological, predict ambiguity in the corpus data and resolve it in the disambiguation model. In the theoretical framework, the syntactic context signals whether a participle is adjectivized, whereas internal morphosemantic properties (that is, tense, voice, and lexical meaning) cause or prevent adjectivization. The exploratory analysis of these factors in the corpus data reveals diverse results. The syntactic factor, the adverb of measure and degree oÄenÊč âveryâ, which is normally used with adjectives, also combines with participles, and is strongly associated with semantic classes of their base verbs. Nonetheless, the use of oÄenÊč with a participle only indicates ambiguity when other syntactic factors of adjectivization are in place. The lexical frequency (including the ranks of base verbs and the ratios of participles to other verbal forms) and several morphological types of participles strongly predict ambiguity. Furthermore, past passive and transitive perfective participles not only have the highest mean ratios among the other morphological types of participles, but are also strong predictors of ambiguity.
The linguistic model using weighted syntactic rules shows the highest accuracy in disambiguation compared to the models with weighted morphological rules or the rule based on weights only. All of the syntactic, morphological, and weighted rules combined show the best performance results. Weights are the most effective for removing residual ambiguity (similar to the statistical baseline model), but are outperformed by the models that use factors of adjectivization as constraints
Eesti keele ĂŒldvaldkonna tekstide laia kattuvusega automaatne sĂŒndmusanalĂŒĂŒs
Seoses tekstide suuremahulise digitaliseerimisega ning digitaalse tekstiloome jĂ€rjest laiema levikuga on tohutul hulgal loomuliku keele tekste muutunud ja muutumas masinloetavaks. Masinloetavus omab potentsiaali muuta tekstimassiivid inimeste jaoks lihtsamini hallatavaks, nt lubada rakendusi nagu automaatne sisukokkuvĂ”tete tegemine ja tekstide pĂ”hjal kĂŒsimustele vastamine, ent paraku ei ulatu praegused automaatanalĂŒĂŒsi vĂ”imalused tekstide sisu tegeliku mĂ”istmiseni. Oletatakse, tekstide sisu mĂ”istvale automaatanalĂŒĂŒsile viib meid lĂ€hemale sĂŒndmusanalĂŒĂŒs â kuna paljud tekstid on narratiivse ĂŒlesehitusega, tĂ”lgendatavad kui âsĂŒndmuste kirjeldusedâ, peaks tekstidest sĂŒndmuste eraldamine ja formaalsel kujul esitamine pakkuma alust mitmete âteksti mĂ”istmistâ nĂ”udvate keeletehnoloogia rakenduste loomisel.
KĂ€esolevas vĂ€itekirjas uuritakse, kuivĂ”rd saab eestikeelsete tekstide sĂŒndmusanalĂŒĂŒsi kĂ€sitleda kui avatud sĂŒndmuste hulka ja ĂŒldvaldkonna tekste hĂ”lmavat automaatse lingvistilise analĂŒĂŒsi ĂŒlesannet. Probleemile lĂ€henetakse eesti keele automaatanalĂŒĂŒsi kontekstis uudsest, sĂŒndmuste ajasemantikale keskenduvast perspektiivist. Töös kohandatakse eesti keelele TimeML mĂ€rgendusraamistik ja luuakse raamistikule toetuv automaatne ajavĂ€ljendite tuvastaja ning ajasemantilise mĂ€rgendusega (sĂŒndmusviidete, ajavĂ€ljendite ning ajaseoste mĂ€rgendusega) tekstikorpus; analĂŒĂŒsitakse korpuse pĂ”hjal inimmĂ€rgendajate kooskĂ”la sĂŒndmusviidete ja ajaseoste mÀÀramisel ning lĂ”puks uuritakse vĂ”imalusi ajasemantika-keskse sĂŒndmusanalĂŒĂŒsi laiendamiseks geneeriliseks sĂŒndmusanalĂŒĂŒsiks sĂŒndmust vĂ€ljendavate keelendite samaviitelisuse lahendamise nĂ€itel.
Töö pakub suuniseid tekstide ajasemantika ja sĂŒndmusstruktuuri mĂ€rgenduse edasiarendamiseks tulevikus ning töös loodud keeleressurssid vĂ”imaldavad nii konkreetsete lĂ”pp-rakenduste (nt automaatne ajakĂŒsimustele vastamine) katsetamist kui ka automaatsete mĂ€rgendustööriistade edasiarendamist.
âDue to massive scale digitalisation processes and a switch from traditional means of written communication to digital written communication, vast amounts of human language texts are becoming machine-readable. Machine-readability holds a potential for easing human effort on searching and organising large text collections, allowing applications such as automatic text summarisation and question answering. However, current tools for automatic text analysis do not reach for text understanding required for making these applications generic. It is hypothesised that automatic analysis of events in texts leads us closer to the goal, as many texts can be interpreted as stories/narratives that are decomposable into events.
This thesis explores event analysis as broad-coverage and general domain automatic language analysis problem in Estonian, and provides an investigation starting from time-oriented event analysis and tending towards generic event analysis. We adapt TimeML framework to Estonian, and create an automatic temporal expression tagger and a news corpus manually annotated for temporal semantics (event mentions, temporal expressions, and temporal relations) for the language; we analyse consistency of human annotation of event mentions and temporal relations, and, finally, provide a preliminary study on event coreference resolution in Estonian news.
The current work also makes suggestions on how future research can improve Estonian event and temporal semantic annotation, and the language resources developed in this work will allow future experimentation with end-user applications (such as automatic answering of temporal questions) as well as provide a basis for developing automatic semantic analysis tools
- âŠ