Mining Meaning from Text by Harvesting Frequent and Diverse Semantic Itemsets

Abstract

Abstract. In this paper, we present a novel and completely-unsupervised approach to unravel meanings (or senses) from linguistic constructions found in large corpora by introducing the concept of semantic vector. A semantic vector is a space-transformed vector where features repre-sent fine-grained semantic information units, instead of values of co-occurrences within a collection of texts. More in detail, instead of seeing words as vectors of frequency values, we propose to first explode words into a multitude of tiny semantic information retrieved from existing re-sources like WordNet and ConceptNet, and then clustering them into frequent and diverse patterns. This way, on the one hand, we are able to model linguistic data with a larger but much more dense and informa-tive semantic feature space. On the other hand, being the model based on basic and conceptual information, we are also able to generate new data by querying the above-mentioned semantic resources with the fea-tures contained in the extracted patterns. We experimented the idea on a dataset of 640 millions of triples subject-verb-object to automatically inducing senses for specific input verbs, demonstrating the validity and the potential of the presented approach in modeling and understanding natural language

    Similar works