10,430 research outputs found

    Word Activation Forces Map Word Networks

    Get PDF
    Words associate with each other in a manner of intricate clusters^1-3^. Yet the brain capably encodes the complex relations into workable networks^4-7^ such that the onset of a word in the brain automatically and selectively activates its associates, facilitating language understanding and generation^8-10^. One believes that the activation strength from one word to another forges and accounts for the latent structures of the word networks. This implies that mapping the word networks from brains to computers^11,12^, which is necessary for various purposes^1,2,13-15^, may be achieved through modeling the activation strengths. However, although a lot of investigations on word activation effects have been carried out^8-10,16-20^, modeling the activation strengths remains open. Consequently, huge labor is required to do the mappings^11,12^. Here we show that our found word activation forces, statistically defined by a formula in the same form of the universal gravitation, capture essential information on the word networks, leading to a superior approach to the mappings. The approach compatibly encodes syntactical and semantic information into sparse coding directed networks, comprehensively highlights the features of individual words. We find that based on the directed networks, sensible word clusters and hierarchies can be efficiently discovered. Our striking results strongly suggest that the word activation forces might reveal the encoding of word networks in the brain

    Russian word sense induction by clustering averaged word embeddings

    Full text link
    The paper reports our participation in the shared task on word sense induction and disambiguation for the Russian language (RUSSE-2018). Our team was ranked 2nd for the wiki-wiki dataset (containing mostly homonyms) and 5th for the bts-rnc and active-dict datasets (containing mostly polysemous words) among all 19 participants. The method we employed was extremely naive. It implied representing contexts of ambiguous words as averaged word embedding vectors, using off-the-shelf pre-trained distributional models. Then, these vector representations were clustered with mainstream clustering techniques, thus producing the groups corresponding to the ambiguous word senses. As a side result, we show that word embedding models trained on small but balanced corpora can be superior to those trained on large but noisy data - not only in intrinsic evaluation, but also in downstream tasks like word sense induction.Comment: Proceedings of the 24rd International Conference on Computational Linguistics and Intellectual Technologies (Dialogue-2018

    SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization

    Full text link
    We study unsupervised multi-document summarization evaluation metrics, which require neither human-written reference summaries nor human annotations (e.g. preferences, ratings, etc.). We propose SUPERT, which rates the quality of a summary by measuring its semantic similarity with a pseudo reference summary, i.e. selected salient sentences from the source documents, using contextualized embeddings and soft token alignment techniques. Compared to the state-of-the-art unsupervised evaluation metrics, SUPERT correlates better with human ratings by 18-39%. Furthermore, we use SUPERT as rewards to guide a neural-based reinforcement learning summarizer, yielding favorable performance compared to the state-of-the-art unsupervised summarizers. All source code is available at https://github.com/yg211/acl20-ref-free-eval.Comment: ACL 202

    Fighting with the Sparsity of Synonymy Dictionaries

    Full text link
    Graph-based synset induction methods, such as MaxMax and Watset, induce synsets by performing a global clustering of a synonymy graph. However, such methods are sensitive to the structure of the input synonymy graph: sparseness of the input dictionary can substantially reduce the quality of the extracted synsets. In this paper, we propose two different approaches designed to alleviate the incompleteness of the input dictionaries. The first one performs a pre-processing of the graph by adding missing edges, while the second one performs a post-processing by merging similar synset clusters. We evaluate these approaches on two datasets for the Russian language and discuss their impact on the performance of synset induction methods. Finally, we perform an extensive error analysis of each approach and discuss prominent alternative methods for coping with the problem of the sparsity of the synonymy dictionaries.Comment: In Proceedings of the 6th Conference on Analysis of Images, Social Networks, and Texts (AIST'2017): Springer Lecture Notes in Computer Science (LNCS

    Hedonic and Transcendent Conceptions of Value

    Get PDF
    In this paper we introduce a conceptual distinction between a hedonic and transcendent conception of value. We posit three linguistic earmarks by which one can distinguish these conceptions of value. We seek validation for the conceptual distinctions by examining the language contained in reviews of cars and reviews of paintings. In undertaking the empirical examination, we draw on the work of M.A.K. Halliday to identify clauses as fundamental units of meaning and to specify process types that can be mapped onto theoretical distinctions between the two conceptions of value. Extensions of this research are discussed
    corecore