2,074 research outputs found

    On the Promotion of the Social Web Intelligence

    Get PDF
    Given the ever-growing information generated through various online social outlets, analytical research on social media has intensified in the past few years from all walks of life. In particular, works on social Web intelligence foster and benefit from the wisdom of the crowds and attempt to derive actionable information from such data. In the form of collective intelligence, crowds gather together and contribute to solving problems that may be difficult or impossible to solve by individuals and single computers. In addition, the consumer insight revealed from social footprints can be leveraged to build powerful business intelligence tools, enabling efficient and effective decision-making processes. This dissertation is broadly concerned with the intelligence that can emerge from the social Web platforms. In particular, the two phenomena of social privacy and online persuasion are identified as the two pillars of the social Web intelligence, studying which is essential in the promotion and advancement of both collective and business intelligence. The first part of the dissertation is focused on the phenomenon of social privacy. This work is mainly motivated by the privacy dichotomy problem. Users often face difficulties specifying privacy policies that are consistent with their actual privacy concerns and attitudes. As such, before making use of social data, it is imperative to employ multiple safeguards beyond the current privacy settings of users. As a possible solution, we utilize user social footprints to detect their privacy preferences automatically. An unsupervised collaborative filtering approach is proposed to characterize the attributes of publicly available accounts that are intended to be private. Unlike the majority of earlier studies, a variety of social data types is taken into account, including the social context, the published content, as well as the profile attributes of users. Our approach can provide support in making an informed decision whether to exploit one\u27s publicly available data to draw intelligence. With the aim of gaining insight into the strategies behind online persuasion, the second part of the dissertation studies written comments in online deliberations. Specifically, we explore different dimensions of the language, the temporal aspects of the communication, as well as the attributes of the participating users to understand what makes people change their beliefs. In addition, we investigate the factors that are perceived to be the reasons behind persuasion by the users. We link our findings to traditional persuasion research, hoping to uncover when and how they apply to online persuasion. A set of rhetorical relations is known to be of importance in persuasive discourse. We further study the automatic identification and disambiguation of such rhetorical relations, aiming to take a step closer towards automatic analysis of online persuasion. Finally, a small proof of concept tool is presented, showing the value of our persuasion and rhetoric studies

    Human-Level Performance on Word Analogy Questions by Latent Relational Analysis

    Get PDF
    This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus

    Anaphor resolution and the scope of syntactic constraints

    Get PDF
    An anaphor resolution algorithm is presented which relies on a combination of strategies for narrowing down and selecting from antecedent sets for re exive pronouns, nonre exive pronouns, and common nouns. The work focuses on syntactic restrictions which are derived from Chomsky's Binding Theory. It is discussed how these constraints can be incorporated adequately in an anaphor resolution algorithm. Moreover, by showing that pragmatic inferences may be necessary, the limits of syntactic restrictions are elucidated

    Answering Causal Questions and Developing Tool Support

    Get PDF

    PersoNER: Persian named-entity recognition

    Full text link
    © 1963-2018 ACL. Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network

    The Impact of Procedural Meaning on Second Language Processing: A Study on Connectives

    Get PDF
    Utterances are minimal ostensive stimuli produced by a speaker for an interlocutor, who interprets them by decoding linguistic input and by carrying out inferential processes (Sperber & Wilson 1995[1986]). Inferencing implies performing computations to obtain and connect mental representations with each other and with the context and obtain cognitive effects from the processed utterances (idem). These inferential processes are often guided by linguistic expressions with procedural meaning, among which are discourse markers. Discourse markers act as instructions during discourse processing by constraining contextual access (Blakemore 1987, 2002; Portolés 2001[1998]; Loureda & Acín 2010). As procedural-meaning expressions, their semantics is rigid and asymmetric as to concepts: instructions encoded in the meaning of a discourse marker must necessarily be executed (Leonetti & Escandell Vidal 2004; Escandell Vidal et al. 2011). In this dissertation, we investigate experimentally how concepts and instructions interact in discourse processing, and how such interaction is managed by speakers with different degrees of competence in an L2, as compared to native speakers. Processing data were gathered in an eye-tracking reading experiment for four discourse-related phenomena and three groups of readers. The two participant groups consisted of speakers an intermediate level (B1 CEFR, n = 58) and with a proficiency level (C1 CEFR, n = 49) in Spanish as an L2; the control group consisted of native speakers of Spanish (n = 102). At the discourse level, we compared processing of different argumentative discourse relations (causality versus counter-argumentation signaled by the Spanish connectives por tanto ‘therefore’ and sin embargo ‘however’, study 1); how the presence of a procedural interpretive guide influences processing of causal relations (implicit versus explicit causality marked by por tanto, study 2); and how congruency between procedural meaning and mind-stored assumptions impacts discourse processing (plausible versus implausible causality marked by por tanto, study 3, and plausible versus implausible causality marked by sin embargo, study 4). In general, results show that discourse relations are approached differently in cognitive terms depending on an individual’s degree of linguistic and pragmatic competence. Most frequently, the patterns obtained point to a direct correlation between proficiency and degree of nativelikeness in L2 performance, both in the strategies deployed, and in the effort allocated in processing of causality and counter-argumentation and in the resolution of pragmatic mismatches. Specifically, feasibility and relevance in discourse overrides discursive differences (the type of discourse relation at issue) from a certain degree of communicative competence on. By contrast, when pragmatic and linguistic competence are not sufficiently developed, relevance and discourse feasibility do not seem to offset the higher cognitive complexity of a certain discourse relation (study 1) and the absence of processing instructions (study 2). Communicative competence is also determinant of whether and how the accommodation strategies needed to process utterances in which procedural meaning leads toward recovering of a communicated assumption that clashes with mind-stored assumptions are performed. Accommodation is cognitively demanding and, therefore, effortful, but only given a certain degree of communicative competence to perform a certain task (studies 3 and 4). In complex and highly complex tasks processing by less proficient language users is shallow (cf. Clahsen & Felser 2006a, 2006b, 2006c) compared to more proficient and native readers. As a result, when the cognitive constraints imposed by the task are very high (study 4), readers fail to carry out the accommodation processes needed to recover the assumption communicated in the utterance. From a theoretical perspective, this study may contribute to the refinement of theories on L2 discourse processing, particularly in respect to how non-native language users cognitively manage discourse marking; from an applied perspective, the experimental evidence provided may serve as a basis for future studies to determine if empirically observed processing strategies in an L2 correlate with the thresholds and the content-sequencing established in frameworks of reference for the teaching and learning of second languages in relation to discourse marking and, in general, to contents at the discourse level for any language skill

    Basic tasks of sentiment analysis

    Full text link
    Subjectivity detection is the task of identifying objective and subjective sentences. Objective sentences are those which do not exhibit any sentiment. So, it is desired for a sentiment analysis engine to find and separate the objective sentences for further analysis, e.g., polarity detection. In subjective sentences, opinions can often be expressed on one or multiple topics. Aspect extraction is a subtask of sentiment analysis that consists in identifying opinion targets in opinionated text, i.e., in detecting the specific aspects of a product or service the opinion holder is either praising or complaining about
    • …
    corecore