566 research outputs found

    Distributional Tensor Space Model of Natural Language Semantics

    Get PDF
    We propose a novel Distributional Tensor Space Model of natural language semantics employing 3d order tensors that accounts for order dependent word contexts and assigns to words characteristic matrices such that semantic composition can be realized in a linguistically and cognitively plausible way. The proposed model achieves state-of-the-art results for important tasks of linguistic semantics by using a relatively small text corpus and without any sophisticated preprocessing

    ItEM: A Vector Space Model to Bootstrap an Italian Emotive Lexicon

    Get PDF
    In recent years computational linguistics has seen a rising interest in subjectivity, opinions, feelings and emotions. Even though great attention has been given to polarity recognition, the research in emotion detection has had to rely on small emotion resources. In this paper, we present a methodology to build emotive lexicons by jointly exploiting vector space models and human annotation, and we provide the first results of the evaluation with a crowdsourcing experiment

    A Socio-mathematical and Structure-Based Approach to Model Sentiment Dynamics in Event-Based Text

    Get PDF
    Natural language texts are often meant to express or impact the emotions of individuals. Recognizing the underlying emotions expressed in or triggered by textual content is essential if one is to arrive at an understanding of the full meaning that textual content conveys. Sentiment analysis (SA) researchers are becoming increasingly interested in investigating natural language processing techniques as well as emotion theory in order to detect, extract, and classify the sentiments that natural language text expresses. Most SA research is focused on the analysis of subjective documents from the writer’s perspective and their classification into categorical labels or sentiment polarity, in which text is associated with a descriptive label or a point on a continuum between two polarities. Researchers often perform sentiment or polarity classification tasks using machine learning (ML) techniques, sentiment lexicons, or hybrid-based approaches. Most ML methods rely on count-based word representations that fail to take word order into account. Despite the successful use of these flat word representations in topic-modelling problems, SA problems require a deeper understanding of sentence structure, since the entire meaning of words can be reversed through negations or word modifiers. On the other hand, approaches based on semantic lexicons are limited by the relatively small number of words they contain, which do not begin to embody the extensive and growing vocabulary on the Internet. The research presented in this thesis represents an effort to tackle the problem of sentiment analysis from a different viewpoint than those underlying current mainstream studies in this research area. A cross-disciplinary approach is proposed that incorporates affect control theory (ACT) into a structured model for determining the sentiment polarity of event-based articles from the perspectives of readers and interactants. A socio-mathematical theory, ACT provides valuable resources for handling interactions between words (event entities) and for predicting situational sentiments triggered by social events. ACT models human emotions arising from social event terms through the use of multidimensional representations that have been verified both empirically and theoretically. To model human emotions regarding textual content, the first step was to develop a fine-grained event extraction algorithm that extracts events and their entities from event-based textual information using semantic and syntactic parsing techniques. The results of the event extraction method were compared against a supervised learning approach on two human-coded corpora (a grammatically correct and a grammatically incorrect structured corpus). For both corpora, the semantic-syntactic event extraction method yielded a higher degree of accuracy than the supervised learning approach. The three-dimensional ACT lexicon was also augmented in a semi-supervised fashion using graph-based label propagation built from semantic and neural network word embeddings. The word embeddings were obtained through the training of commonly used count-based and neural-network-based algorithms on a single corpus, and each method was evaluated with respect to the reconstruction of a sentiment lexicon. The results show that, relative to other word embeddings and state-of-the-art methods, combining both semantic and neural word embeddings yielded the highest correlation scores and lowest error rates. Using the augmented lexicon and ACT mathematical equations, human emotions were modelled according to different levels of granularity (i.e., at the sentence and document levels). The initial stage involved the development of a proposed entity-based SA approach that models reader emotions triggered by event-based sentences. The emotions are modelled in a three-dimensional space based on reader sentiment toward different entities (e.g., subject and object) in the sentence. The new approach was evaluated using a human-annotated news-headline corpus; the results revealed the proposed method to be competitive with benchmark ML techniques. The second phase entailed the creation of a proposed ACT-based model for predicting the temporal progression of the emotions of the interactants and their optimal behaviour over a sequence of interactions. The model was evaluated using three different corpora: fairy tales, news articles, and a handcrafted corpus. The results produced by the proposed model demonstrate that, despite the challenging sentence structure, a reasonable agreement was achieved between the estimated emotions and behaviours and the corresponding ground truth

    Three Essays on Trust Mining in Online Social Networks

    Get PDF
    This dissertation research consists of three essays on studying trust in online social networks. Trust plays a critical role in online social relationships, because of the high levels of risk and uncertainty involved. Guided by relevant social science and computational graph theories, I develop conceptual and predictive models to gain insights into trusting behaviors in online social relationships. In the first essay, I propose a conceptual model of trust formation in online social networks. This is the first study that integrates the existing graph-based view of trust formation in social networks with socio-psychological theories of trust to provide a richer understanding of trusting behaviors in online social networks. I introduce new behavioral antecedents of trusting behaviors and redefine and integrate existing graph-based concepts to develop the proposed conceptual model. The empirical findings indicate that both socio-psychological and graph-based trust-related factors should be considered in studying trust formation in online social networks. In the second essay, I propose a theory-based predictive model to predict trust and distrust links in online social networks. Previous trust prediction models used limited network structural data to predict future trust/distrust relationships, ignoring the underlying behavioral trust-inducing factors. I identify a comprehensive set of behavioral and structural predictors of trust/distrust links based on related theories, and then build multiple supervised classification models to predict trust/distrust links in online social networks. The empirical results confirm the superior fit and predictive performance of the proposed model over the baselines. In the third essay, I propose a lexicon-based text mining model to mine trust related user-generated content (UGC). This is the first theory-based text mining model to examine important factors in online trusting decisions from UGC. I build domain-specific trustworthiness lexicons for online social networks based on related behavioral foundations and text mining techniques. Next, I propose a lexicon-based text mining model that automatically extracts and classifies trustworthiness characteristics from trust reviews. The empirical evaluations show the superior performance of the proposed text mining system over the baselines

    Rapid Exploitation and Analysis of Documents

    Full text link

    Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

    Get PDF
    This dissertation focuses on effective combination of data-driven natural language processing (NLP) approaches with linguistic knowledge sources that are based on manual text annotation or word grouping according to semantic commonalities. I gainfully apply fine-grained linguistic soft constraints -- of syntactic or semantic nature -- on statistical NLP models, evaluated in end-to-end state-of-the-art statistical machine translation (SMT) systems. The introduction of semantic soft constraints involves intrinsic evaluation on word-pair similarity ranking tasks, extension from words to phrases, application in a novel distributional paraphrase generation technique, and an introduction of a generalized framework of which these soft semantic and syntactic constraints can be viewed as instances, and in which they can be potentially combined. Fine granularity is key in the successful combination of these soft constraints, in many cases. I show how to softly constrain SMT models by adding fine-grained weighted features, each preferring translation of only a specific syntactic constituent. Previous attempts using coarse-grained features yielded negative results. I also show how to softly constrain corpus-based semantic models of words (“distributional profiles”) to effectively create word-sense-aware models, by using semantic word grouping information found in a manually compiled thesaurus. Previous attempts, using hard constraints and resulting in aggregated, coarse-grained models, yielded lower gains. A novel paraphrase generation technique incorporating these soft semantic constraints is then also evaluated in a SMT system. This paraphrasing technique is based on the Distributional Hypothesis. The main advantage of this novel technique over current “pivoting” techniques for paraphrasing is the independence from parallel texts, which are a limited resource. The evaluation is done by augmenting translation models with paraphrase-based translation rules, where fine-grained scoring of paraphrase-based rules yields significantly higher gains. The model augmentation includes a novel semantic reinforcement component: In many cases there are alternative paths of generating a paraphrase-based translation rule. Each of these paths reinforces a dedicated score for the “goodness” of the new translation rule. This augmented score is then used as a soft constraint, in a weighted log-linear feature, letting the translation model learn how much to “trust” the paraphrase-based translation rules. The work reported here is the first to use distributional semantic similarity measures to improve performance of an end-to-end phrase-based SMT system. The unified framework for statistical NLP models with soft linguistic constraints enables, in principle, the combination of both semantic and syntactic constraints -- and potentially other constraints, too -- in a single SMT model
    • …