6,387 research outputs found
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
Acquiring Word-Meaning Mappings for Natural Language Interfaces
This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted
Examples), that acquires a semantic lexicon from a corpus of sentences paired
with semantic representations. The lexicon learned consists of phrases paired
with meaning representations. WOLFIE is part of an integrated system that
learns to transform sentences into representations such as logical database
queries. Experimental results are presented demonstrating WOLFIE's ability to
learn useful lexicons for a database interface in four different natural
languages. The usefulness of the lexicons learned by WOLFIE are compared to
those acquired by a similar system, with results favorable to WOLFIE. A second
set of experiments demonstrates WOLFIE's ability to scale to larger and more
difficult, albeit artificially generated, corpora. In natural language
acquisition, it is difficult to gather the annotated data needed for supervised
learning; however, unannotated data is fairly plentiful. Active learning
methods attempt to select for annotation and training only the most informative
examples, and therefore are potentially very useful in natural language
applications. However, most results to date for active learning have only
considered standard classification tasks. To reduce annotation effort while
maintaining accuracy, we apply active learning to semantic lexicons. We show
that active learning can significantly reduce the number of annotated examples
required to achieve a given level of performance
Multi-Level Modeling of Quotation Families Morphogenesis
This paper investigates cultural dynamics in social media by examining the
proliferation and diversification of clearly-cut pieces of content: quoted
texts. In line with the pioneering work of Leskovec et al. and Simmons et al.
on memes dynamics we investigate in deep the transformations that quotations
published online undergo during their diffusion. We deliberately put aside the
structure of the social network as well as the dynamical patterns pertaining to
the diffusion process to focus on the way quotations are changed, how often
they are modified and how these changes shape more or less diverse families and
sub-families of quotations. Following a biological metaphor, we try to
understand in which way mutations can transform quotations at different scales
and how mutation rates depend on various properties of the quotations.Comment: Published in the Proceedings of the ASE/IEEE 4th Intl. Conf. on
Social Computing "SocialCom 2012", Sep. 3-5, 2012, Amsterdam, N
A topic modeling based approach to novel document automatic summarization
© 2017 Elsevier Ltd Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality
A constraint-based approach to noun phrase coreference resolution in German newspaper text
In this paper, we investigate the usefulness of a wide range of features for their usefulness in the resolution of nominal coreference, both as hard constraints (i.e. completely removing elements from the list of possible candidates) as well as soft constraints (where a cumulation of violations of soft constraints will make it less likely that a candidate is chosen as the antecedent). We present a state of the art system based on such constraints and weights estimated with a maximum entropy model, using lexical information to resolve cases of coreferent bridging
Exploring Metaphorical Senses and Word Representations for Identifying Metonyms
A metonym is a word with a figurative meaning, similar to a metaphor. Because
metonyms are closely related to metaphors, we apply features that are used
successfully for metaphor recognition to the task of detecting metonyms. On the
ACL SemEval 2007 Task 8 data with gold standard metonym annotations, our system
achieved 86.45% accuracy on the location metonyms. Our code can be found on
GitHub.Comment: 9 pages, 8 pages conten
Towards an explication and description of synonymy in English
The thesis begins by arguing for an a posteriori approach to
synonymy, according to which synonymy should be treated as an em¬
pirical phenomenon which it is the task of linguistic semantics to
explicate. Arguments are presented against the a priori approach
often underlying treatments of synonymy, which makes it possible to
define synonymy out of existence. A distinction is then drawn
between three possible levels of synonymy (i.e. lexeme-synonymy,
sense-synonymy and occurrence-synonymy), and it is argued that all
three should be treated as legitimate levels - occurrence-synonymy
as the basic level and the other two chiefly as a means of stating
synonymy-relations more economically, where appropriate. This is
followed by the establishment of two criteria of synonymy for ail
three levels. After discussion and (in some cases) re-definition
of various types of acceptability and anomaly, the interchangeability criterion is defined as the mutual substitutability of words
without causing either grammatical or collocational anomaly.
The sameness of meaning criterion is based on the distinctions
between pragmatic and analytic equivalence and between performance
and judgement equivalence, and is defined in terms of the first
alternative in each case. While my concern up to this point is
with the explication of synonymy, the remainder of the thesis is
devoted to its description. A distinction is drawn between two
types of case where two senses are synonymous in some contexts but
not in others. Two types of explanation are provided accordingly.
The thesis ends by discussing various types of communicatively
relevant difference between synonyms
- …