598 research outputs found
Eesti keele üldvaldkonna tekstide laia kattuvusega automaatne sündmusanalüüs
Seoses tekstide suuremahulise digitaliseerimisega ning digitaalse tekstiloome järjest laiema levikuga on tohutul hulgal loomuliku keele tekste muutunud ja muutumas masinloetavaks. Masinloetavus omab potentsiaali muuta tekstimassiivid inimeste jaoks lihtsamini hallatavaks, nt lubada rakendusi nagu automaatne sisukokkuvõtete tegemine ja tekstide põhjal küsimustele vastamine, ent paraku ei ulatu praegused automaatanalüüsi võimalused tekstide sisu tegeliku mõistmiseni. Oletatakse, tekstide sisu mõistvale automaatanalüüsile viib meid lähemale sündmusanalüüs – kuna paljud tekstid on narratiivse ülesehitusega, tõlgendatavad kui „sündmuste kirjeldused”, peaks tekstidest sündmuste eraldamine ja formaalsel kujul esitamine pakkuma alust mitmete „teksti mõistmist” nõudvate keeletehnoloogia rakenduste loomisel.
Käesolevas väitekirjas uuritakse, kuivõrd saab eestikeelsete tekstide sündmusanalüüsi käsitleda kui avatud sündmuste hulka ja üldvaldkonna tekste hõlmavat automaatse lingvistilise analüüsi ülesannet. Probleemile lähenetakse eesti keele automaatanalüüsi kontekstis uudsest, sündmuste ajasemantikale keskenduvast perspektiivist. Töös kohandatakse eesti keelele TimeML märgendusraamistik ja luuakse raamistikule toetuv automaatne ajaväljendite tuvastaja ning ajasemantilise märgendusega (sündmusviidete, ajaväljendite ning ajaseoste märgendusega) tekstikorpus; analüüsitakse korpuse põhjal inimmärgendajate kooskõla sündmusviidete ja ajaseoste määramisel ning lõpuks uuritakse võimalusi ajasemantika-keskse sündmusanalüüsi laiendamiseks geneeriliseks sündmusanalüüsiks sündmust väljendavate keelendite samaviitelisuse lahendamise näitel.
Töö pakub suuniseid tekstide ajasemantika ja sündmusstruktuuri märgenduse edasiarendamiseks tulevikus ning töös loodud keeleressurssid võimaldavad nii konkreetsete lõpp-rakenduste (nt automaatne ajaküsimustele vastamine) katsetamist kui ka automaatsete märgendustööriistade edasiarendamist.
Due to massive scale digitalisation processes and a switch from traditional means of written communication to digital written communication, vast amounts of human language texts are becoming machine-readable. Machine-readability holds a potential for easing human effort on searching and organising large text collections, allowing applications such as automatic text summarisation and question answering. However, current tools for automatic text analysis do not reach for text understanding required for making these applications generic. It is hypothesised that automatic analysis of events in texts leads us closer to the goal, as many texts can be interpreted as stories/narratives that are decomposable into events.
This thesis explores event analysis as broad-coverage and general domain automatic language analysis problem in Estonian, and provides an investigation starting from time-oriented event analysis and tending towards generic event analysis. We adapt TimeML framework to Estonian, and create an automatic temporal expression tagger and a news corpus manually annotated for temporal semantics (event mentions, temporal expressions, and temporal relations) for the language; we analyse consistency of human annotation of event mentions and temporal relations, and, finally, provide a preliminary study on event coreference resolution in Estonian news.
The current work also makes suggestions on how future research can improve Estonian event and temporal semantic annotation, and the language resources developed in this work will allow future experimentation with end-user applications (such as automatic answering of temporal questions) as well as provide a basis for developing automatic semantic analysis tools
Recommended from our members
Advances in statistical script learning
When humans encode information into natural language, they do so with the
clear assumption that the reader will be able to seamlessly make inferences
based on world knowledge. For example, given the sentence ``Mrs. Dalloway said
she would buy the flowers herself,'' one can make a number of probable
inferences based on event co-occurrences: she bought flowers, she went to a
store, she took the flowers home, and so on.
Observing this, it is clear that many different useful natural language
end-tasks could benefit from models of events as they typically co-occur
(so-called script models).
Robust question-answering systems must be able to infer highly-probable implicit
events from what is explicitly stated in a text, as must robust
information-extraction systems that map from unstructured text to formal
assertions about relations expressed in the text. Coreference resolution
systems, semantic role labeling, and even syntactic parsing systems could, in
principle, benefit from event co-occurrence models.
To this end, we present a number of contributions related to statistical
event co-occurrence models. First, we investigate a method of incorporating
multiple entities into events in a count-based co-occurrence model. We find that
modeling multiple entities interacting across events allows for improved
empirical performance on the task of modeling sequences of events in documents.
Second, we give a method of applying Recurrent Neural Network sequence models
to the task of predicting held-out predicate-argument structures from documents.
This model allows us to easily incorporate entity noun information, and can
allow for more complex, higher-arity events than a count-based co-occurrence
model. We find the neural model improves performance considerably over the
count-based co-occurrence model.
Third, we investigate the performance of a sequence-to-sequence encoder-decoder
neural model on the task of predicting held-out predicate-argument events from
text. This model does not explicitly model any external syntactic information,
and does not require a parser. We find the text-level model to be competitive in
predictive performance with an event level model directly mediated by an
external syntactic analysis.
Finally, motivated by this result, we investigate incorporating features derived
from these models into a baseline noun coreference resolution system. We find
that, while our additional features do not appreciably improve top-level
performance, we can nonetheless provide empirical improvement on a number of
restricted classes of difficult coreference decisions.Computer Science
University of Sheffield: Description of the LaSIE System as Used for MUC-6
more generally, natural languag e engineering. LaSIE is a single, integrated system that builds up a unified model of a text which is then used t
- …