7 research outputs found

    Embedding Predications

    Get PDF
    Written communication is rarely a sequence of simple assertions. More often, in addition to simple assertions, authors express subjectivity, such as beliefs, speculations, opinions, intentions, and desires. Furthermore, they link statements of various kinds to form a coherent discourse that reflects their pragmatic intent. In computational semantics, extraction of simple assertions (propositional meaning) has attracted the greatest attention, while research that focuses on extra-propositional aspects of meaning has remained sparse overall and has been largely limited to narrowly defined categories, such as hedging or sentiment analysis, treated in isolation. In this thesis, we contribute to the understanding of extra-propositional meaning in natural language understanding, by providing a comprehensive account of the semantic phenomena that occur beyond simple assertions and examining how a coherent discourse is formed from lower level semantic elements. Our approach is linguistically based, and we propose a general, unified treatment of the semantic phenomena involved, within a computationally viable framework. We identify semantic embedding as the core notion involved in expressing extra-propositional meaning. The embedding framework is based on the structural distinction between embedding and atomic predications, the former corresponding to extra-propositional aspects of meaning. It incorporates the notions of predication source, modality scale, and scope. We develop an embedding categorization scheme and a dictionary based on it, which provide the necessary means to interpret extra-propositional meaning with a compositional semantic interpretation methodology. Our syntax-driven methodology exploits syntactic dependencies to construct a semantic embedding graph of a document. Traversing the graph in a bottom-up manner guided by compositional operations, we construct predications corresponding to extra-propositional semantic content, which form the basis for addressing practical tasks. We focus on text from two distinct domains: news articles from the Wall Street Journal, and scientific articles focusing on molecular biology. Adopting a task-based evaluation strategy, we consider the easy adaptability of the core framework to practical tasks that involve some extra-propositional aspect as a measure of its success. The computational tasks we consider include hedge/uncertainty detection, scope resolution, negation detection, biological event extraction, and attribution resolution. Our competitive results in these tasks demonstrate the viability of our proposal

    Knowledge modeling of phishing emails

    Get PDF
    This dissertation investigates whether or not malicious phishing emails are detected better when a meaningful representation of the email bodies is available. The natural language processing theory of Ontological Semantics Technology is used for its ability to model the knowledge representation present in the email messages. Known good and phishing emails were analyzed and their meaning representations fed into machine learning binary classifiers. Unigram language models of the same emails were used as a baseline for comparing the performance of the meaningful data. The end results show how a binary classifier trained on meaningful data is better at detecting phishing emails than a unigram language model binary classifier at least using some of the selected machine learning algorithms

    Messaging Forensic Framework for Cybercrime Investigation

    Get PDF
    Online predators, botmasters, and terrorists abuse the Internet and associated web technologies by conducting illegitimate activities such as bullying, phishing, and threatening. These activities often involve online messages between a criminal and a victim, or between criminals themselves. The forensic analysis of online messages to collect empirical evidence that can be used to prosecute cybercriminals in a court of law is one way to minimize most cybercrimes. The challenge is to develop innovative tools and techniques to precisely analyze large volumes of suspicious online messages. We develop a forensic analysis framework to help an investigator analyze the textual content of online messages with two main objectives. First, we apply our novel authorship analysis techniques for collecting patterns of authorial attributes to address the problem of anonymity in online communication. Second, we apply the proposed knowledge discovery and semantic anal ysis techniques for identifying criminal networks and their illegal activities. The focus of the framework is to collect creditable, intuitive, and interpretable evidence for both technical and non-technical professional experts including law enforcement personnel and jury members. To evaluate our proposed methods, we share our collaborative work with a local law enforcement agency. The experimental result on real-life data suggests that the presented forensic analysis framework is effective for cybercrime investigation

    Inquiries into the lexicon-syntax relations in Basque

    Get PDF
    Index:- Foreword. B. Oyharçabal.- Morphosyntactic disambiguation and shallow parsing in computational processing in Basque. I. Aduriz, A. Díaz de Ilarraza.- The transitivity of borrowed verbs in Basque: an outline. X. Alberdi.- Patrixa: a unification-based parser for Basque and its application to the automatic analysis of verbs. I. Aldezabal, M. J. Aranzabe, A. Atutxa, K.Gojenola, K, Sarasola.- Learning argument/adjunct distinction for Basque. I. Aldezabal, M. J. Aranzabe, K. Gojenola, K, Sarasola, A. Atutxa.- Analyzing verbal subcategorization aimed at its computation application. I. Aldezabal, P. Goenaga.- Automatic extraction of verb paterns from “hauta-lanerako euskal hiztegia”. J. M. Arriola, X. Artola, A. Soroa.- The case of an enlightening, provoking an admirable Basque derivational siffux with implications for the theory of argument structure. X. Artiagoitia.- Verb-deriving processes in Basque. J. C. Odriozola.- Lexical causatives and causative alternation in Basque. B. Oyharçabal.- Causation and semantic control; diagnosis of incorrect use in minorized languages. I. Zabala.- Subject index.- Contributions

    Evaluative rhetorical strategies in the broadsheet review genre: the case of four British broadsheets

    Get PDF
    The thesis investigates rhetorical evaluative strategies in four British Broadsheets: The Daily Telegraph, The Guardian, The Independent and The Times Literary Supplement. This study views writing in the interpersonal domain where language is shaped by social needs, politeness rules and the notion of appropriacy that is not absolute but mediated by the reading public. Broadsheet reviews come across as highly interactional texts where the voice of the reviewer overlaps with the voice of the reader and the voice of the author of the book. These voices are carefully orchestrated and framed within an argumentative discourse that aims at maintaining non conflictual relationships that respect the public’s Face in the sense that Brown and Levinson (1978) give to the word. However, broadsheet reviewers also fulfil genre expectations that a review be honest and balanced. A corpus of 72 reviews was coded and analysed, in order to detect the ways in which broadsheet reviewers select certain rhetorical evaluative strategies to judge the book and the work of the author. As these evaluative strategies seem to cluster round the conjunct BUT, and this is a key hub of evaluation in the Broadsheet genre, a database of 111 sentences featuring the conjunct is established. It is found that evaluative strategies clustering round the conjunct BUT are carefully planned by reviewers who distribute them in salient parts of the text. The choice of linguistic resources to judge a book are dictated by interpersonal needs aimed at reducing the Face Threat to authors and readers. Consequently, the Praise and Criticism Pair - that has a huge hedging potential - is often chosen to evaluate the work of authors while Criticism is hardly ever placed at the beginning of the review. Interaction with the readers seems to impact the evaluative patterns that occur in BRs. The clauses before BUT act as a prelude for evaluative acts while the clauses after BUT are the locus where evaluation is presented to the reader. Both the Praise and Criticism Pair and Hedges ensure mitigated evaluative acts that are framed in a cogent line of argumentation which makes them acceptable to readers. The skillful use of hedging allows broadsheet reviewers to be critical towards the Author and Specific Aspects of the book that are the recurring targets of the BUT Node. One of the main claims of this thesis is that broadsheet reviews are argumentative texts where the key organizational principle underpinning discourse is the worry to justify the judgement presented about the book read. This justification is framed within argumentation
    corecore