14 research outputs found
Recommended from our members
Who, What, When, Where, Why? Comparing Multiple Approaches to the Cross-Lingual 5W Task
Cross-lingual tasks are especially difficult due to the compounding effect of errors in language processing and errors in machine translation (MT). In this paper, we present an error analysis of a new cross-lingual task: the 5W task, a sentence-level understanding task which seeks to return the English 5W's (Who, What, When, Where and Why) corresponding to a Chinese sentence. We analyze systems that we developed, identifying specific problems in language processing and MT that cause errors. The best cross-lingual 5W system was still 19% worse than the best monolingual 5W system, which shows that MT significantly degrades sentence-level understanding. Neither source-language nor target-language analysis was able to circumvent problems in MT, although each approach had advantages relative to the other. A detailed error analysis across multiple systems suggests directions for future research on the problem
Estonian football specific corpora automatic semantic role labeling with football specific Framenet
Käesoleva töö eesmärgiks on uurida ning üritada lahendada eestikeelse teksti automaatse freimidega märgendamise probleemi. Üldine eestikeelne Framenet on alles algusjärgus, kuid olemas on terviklik jalgpalli-alane freimide ressurss, mille abil üritame tõestada hüpoteesi, et jalgpalli-alase teksti märgendamiseks piisab vaid morfoloogilisest ning süntaktilisest infost. Sellele hüpoteesile me siiski kinnitust ei saanud, kuna sama tähendust kandvat lauset on võimalik esitada liiga paljudel erinevatel viisidel. Lisaks täiendasime jalgpalli-alaste sõnadega Eesti suurimat leksikaal-semantilist andmebaasi, Wordnetti.Research and a possible solution to the problem of automatic semantic role labeling of text in Estonian is carried out in this paper. A general Estonian Framenet is in the starting phase, but there is also available a football specific Framenet. We try to prove the hypothesis that morphological and syntactical information is enough for automatic semantic role labeling in
football related corpora. Unfortunately, we did not achieve a confirmation for the hypothesis, because there are too many ways to present sentences that have the same meaning. In addition, we supplemented Estonian biggest lexical-syntactic database with football related words
Semantic argument classification and semantic categorization of Turkish existential sentences using support vector learning
Cataloged from PDF version of article.There are three types of sentences that form all existing natural languages: verbal
sentences (e.g. “I read the book.”), copulative sentences (e.g. “The book is on the
table.”), and existential sentences (e.g. “There is a book on the table.”). Syntactic and
semantic recognition of these sentence types are crucially important in computational
linguistics although there has not been any significant work towards this end. This
thesis, in an attempt to fill this evident gap, is on identifying and assigning semantic
categories of Turkish existential sentences in print. Existential sentences in Turkish are
minimally characterized by the two existential particles var, meaning there is/are, and
yok, meaning there is/are no. In addition to these most basic meanings, other senses of
existential particles are possible, which can be categorized into groups such as case
existentials and possession existentials. Our system does shallow semantic parsing in
defining the predicate-argument relationships in an existential sentence on a word-byword
basis, via utilizing Support Vector Machines, after which it proceeds with the
semantic categorization of the whole sentence. For both of these tasks, our system
produces promising results, in terms of accuracy and precision/recall, respectively. Part
of this research contributes to the annotation of the METU-Sabancı Turkish Treebank
with semantic information.Koca, AylinM.S
Doctor of Philosophy
dissertationEvents are one important type of information throughout text. Event extraction is an information extraction (IE) task that involves identifying entities and objects (mainly noun phrases) that represent important roles in events of a particular type. However, the extraction performance of current event extraction systems is limited because they mainly consider local context (mostly isolated sentences) when making each extraction decision. My research aims to improve both coverage and accuracy of event extraction performance by explicitly identifying event contexts before extracting individual facts. First, I introduce new event extraction architectures that incorporate discourse information across a document to seek out and validate pieces of event descriptions within the document. TIER is a multilayered event extraction architecture that performs text analysis at multiple granularities to progressively \zoom in" on relevant event information. LINKER is a unied discourse-guided approach that includes a structured sentence classier to sequentially read a story and determine which sentences contain event information based on both the local and preceding contexts. Experimental results on two distinct event domains show that compared to previous event extraction systems, TIER can nd more event information while maintaining a good extraction accuracy, and LINKER can further improve extraction accuracy. Finding documents that describe a specic type of event is also highly challenging because of the wide variety and ambiguity of event expressions. In this dissertation, I present the multifaceted event recognition approach that uses event dening characteristics (facets), in addition to event expressions, to eectively resolve the complexity of event descriptions. I also present a novel bootstrapping algorithm to automatically learn event expressions as well as facets of events, which requires minimal human supervision. Experimental results show that the multifaceted event recognition approach can eectively identify documents that describe a particular type of event and make event extraction systems more precise
Concept Mining: A Conceptual Understanding based Approach
Due to the daily rapid growth of the information, there are
considerable needs to extract and discover valuable knowledge from
data sources such as the World Wide Web. Most of the common
techniques in text mining are based on the statistical analysis of a
term either word or phrase. These techniques consider documents as
bags of words and pay no attention to the meanings of the document
content. In addition, statistical analysis of a term frequency
captures the importance of the term within a document only. However,
two terms can have the same frequency in their documents, but one
term contributes more to the meaning of its sentences than the other
term. Therefore, there is an intensive need for a model that
captures the meaning of linguistic utterances in a formal structure.
The underlying model should indicate terms that capture the
semantics of text. In this case, the model can capture terms that
present the concepts of the sentence, which leads to discover the
topic of the document.
A new concept-based model that analyzes terms on the sentence,
document and corpus levels rather than the traditional analysis of
document only is introduced. The concept-based model can effectively
discriminate between non-important terms with respect to sentence
semantics and terms which hold the concepts that represent the
sentence meaning.
The proposed model consists of concept-based statistical analyzer,
conceptual ontological graph representation, concept extractor and
concept-based similarity measure. The term which contributes to the
sentence semantics is assigned two different weights by the
concept-based statistical analyzer and the conceptual ontological
graph representation. These two weights are combined into a new
weight. The concepts that have maximum combined weights are selected
by the concept extractor. The similarity between documents is
calculated based on a new concept-based similarity measure. The
proposed similarity measure takes full advantage of using the
concept analysis measures on the sentence, document, and corpus
levels in calculating the similarity between documents.
Large sets of experiments using the proposed concept-based model on
different datasets in text clustering, categorization and retrieval
are conducted. The experiments demonstrate extensive comparison
between traditional weighting and the concept-based weighting
obtained by the concept-based model. Experimental results in text
clustering, categorization and retrieval demonstrate the substantial
enhancement of the quality using: (1) concept-based term frequency
(tf), (2) conceptual term frequency (ctf), (3) concept-based
statistical analyzer, (4) conceptual ontological graph, (5)
concept-based combined model.
In text clustering, the evaluation of results is relied on two
quality measures, the F-Measure and the Entropy. In text
categorization, the evaluation of results is relied on three quality
measures, the Micro-averaged F1, the Macro-averaged F1 and the Error
rate. In text retrieval, the evaluation of results relies on three
quality measures, the precision at 10 documents retrieved P(10), the
preference measure (bpref), and the mean uninterpolated average
precision (MAP). All of these quality measures are improved when the
newly developed concept-based model is used to enhance the quality
of the text clustering, categorization and retrieval
Joint learning of syntactic and semantic dependencies
In this master’s thesis we designed, implemented and evaluated a novel
joint syntactic and semantic parsing model