Mining Meaning from Text by Harvesting Frequent and Diverse Semantic Itemsets

Boella, Guido; Di Caro, Luigi

Mining Meaning from Text by Harvesting Frequent and Diverse Semantic Itemsets

Authors: Guido Boella
Luigi Di Caro
Publication date: 1 January 2014
Publisher: CEUR-WS.org

Abstract

Abstract. In this paper, we present a novel and completely-unsupervised approach to unravel meanings (or senses) from linguistic constructions found in large corpora by introducing the concept of semantic vector. A semantic vector is a space-transformed vector where features repre-sent fine-grained semantic information units, instead of values of co-occurrences within a collection of texts. More in detail, instead of seeing words as vectors of frequency values, we propose to first explode words into a multitude of tiny semantic information retrieved from existing re-sources like WordNet and ConceptNet, and then clustering them into frequent and diverse patterns. This way, on the one hand, we are able to model linguistic data with a larger but much more dense and informa-tive semantic feature space. On the other hand, being the model based on basic and conceptual information, we are also able to generate new data by querying the above-mentioned semantic resources with the fea-tures contained in the extracted patterns. We experimented the idea on a dataset of 640 millions of triples subject-verb-object to automatically inducing senses for specific input verbs, demonstrating the validity and the potential of the presented approach in modeling and understanding natural language

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.661.3...

Last time updated on 29/10/2017

Institutional Research Information System University of Turin

oai:iris.unito.it:2318/1566784

Last time updated on 18/04/2020