415,207 research outputs found

    Concept-based Text Clustering

    Get PDF
    Thematic organization of text is a natural practice of humans and a crucial task for today's vast repositories. Clustering automates this by assessing the similarity between texts and organizing them accordingly, grouping like ones together and separating those with different topics. Clusters provide a comprehensive logical structure that facilitates exploration, search and interpretation of current texts, as well as organization of future ones. Automatic clustering is usually based on words. Text is represented by the words it mentions, and thematic similarity is based on the proportion of words that texts have in common. The resulting bag-of-words model is semantically ambiguous and undesirably orthogonal|it ignores the connections between words. This thesis claims that using concepts as the basis of clustering can significantly improve effectiveness. Concepts are defined as units of knowledge. When organized according to the relations among them, they form a concept system. Two concept systems are used here: WordNet, which focuses on word knowledge, and Wikipedia, which encompasses world knowledge. We investigate a clustering procedure with three components: using concepts to represent text; taking the semantic relations among them into account during clustering; and learning a text similarity measure from concepts and their relations. First, we demonstrate that concepts provide a succinct and informative representation of the themes in text, exemplifying this with the two concept systems. Second, we define methods for utilizing concept relations to enhance clustering by making the representation models more discriminative and extending thematic similarity beyond surface overlap. Third, we present a similarity measure based on concepts and their relations that is learned from a small number of examples, and show that it both predicts similarity consistently with human judgement and improves clustering. The thesis provides strong support for the use of concept-based representations instead of the classic bag-of-words model

    Neurocognitive Informatics Manifesto.

    Get PDF
    Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given

    Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge

    Get PDF
    Distributional models provide a convenient way to model semantics using dense embedding spaces derived from unsupervised learning algorithms. However, the dimensions of dense embedding spaces are not designed to resemble human semantic knowledge. Moreover, embeddings are often built from a single source of information (typically text data), even though neurocognitive research suggests that semantics is deeply linked to both language and perception. In this paper, we combine multimodal information from both text and image-based representations derived from state-of-the-art distributional models to produce sparse, interpretable vectors using Joint Non-Negative Sparse Embedding. Through in-depth analyses comparing these sparse models to human-derived behavioural and neuroimaging data, we demonstrate their ability to predict interpretable linguistic descriptions of human ground-truth semantic knowledge.Comment: Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL 2018), pages 260-270. Brussels, Belgium, October 31 - November 1, 2018. Association for Computational Linguistic

    Multimodal Grounding for Language Processing

    Get PDF
    This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page
    • …
    corecore