29 research outputs found

    Acts of killing, acts of meaning:an application of corpus pattern analysis to language of animal-killing

    Get PDF
    We are currently witnessing unprecedented levels of ecological destruction and violence visited upon nonhumans. Study of the more-than-human world is now being enthusiastically taken up across a range of disciplines, in what has been called the ‘scholarly animal turn’. This thesis brings together concerns of Critical Animal Studies – along with related threads of posthumanism and new materialist thinking – and Corpus Linguistics, specifically Corpus Pattern Analysis (CPA), to produce a data-driven, lexicocentric study of the discourse of animal-killing. CPA, which has been employed predominantly in corpus lexicography, provides a robust and empirically well-founded basis for the analysis of verbs. Verbs are chosen as they act as the pivot of a clause; analysing them also uncovers their arguments – in this case, participants in material-discursive ‘killing’ events. This project analyses 15 ‘killing’ verbs using CPA as a basis, in what I term a corpus-lexicographical discourse analysis. The data is sampled from an animal-themed corpus of around 9 million words of contemporary British English, and the British National Corpus is used for reference. The findings are both methodological and substantive. CPA is found to be a reliable empirical starting point for discourse analysis, and the lexicographical practice of establishing linguistic ‘norms’ is critical to the identification of anomalous uses. The thesis presents evidence of anthropocentrism inherent in the English lexicon, and demonstrates several ways in which distance is created between participants of ‘killing’ constructions. The analysis also reveals specific ways that verbs can obfuscate, deontologise and deindividualise their arguments. The recommendations, for discourse analysts, include the adoption of CPA and a critical analysis of its resulting patterns in order to demonstrate the precise mechanisms by which verb use can either oppress or empower individuals. Social justice advocates are also alerted to potentially harmful language that might undermine their cause

    Comparing two thesaurus representations for Russian

    Get PDF
    © 2018 Global WordNet Association. All Rights Reserved. In the paper we presented a new Russian wordnet, RuWordNet, which was semi-automatically obtained by transformation of the existing Russian thesaurus RuThes. At the first step, the basic structure of wordnets was reproduced: synsets’ hierarchy for each part of speech and the basic set of relations between synsets (hyponym-hypernym, part-whole, antonyms). At the second stage, we added causation, entailment and domain relations between synsets. Also derivation relations were established for single words and the component structure for phrases included in RuWordNet. The described procedure of transformation highlights the specific features of each type of thesaurus representations

    Creating large semantic lexical resources for the Finnish language

    Get PDF
    Finnish belongs into the Finno-Ugric language family, and it is spoken by the vast majority of the people living in Finland. The motivation for this thesis is to contribute to the development of a semantic tagger for Finnish. This tool is a parallel of the English Semantic Tagger which has been developed at the University Centre for Computer Corpus Research on Language (UCREL) at Lancaster University since the beginning of the 1990s and which has over the years proven to be a very powerful tool in automatic semantic analysis of English spoken and written data. The English Semantic Tagger has various successful applications in the fields of natural language processing and corpus linguistics, and new application areas emerge all the time. The semantic lexical resources that I have created in this thesis provide the knowledge base for the Finnish Semantic Tagger. My main contributions are the lexical resources themselves, along with a set of methods and guidelines for their creation and expansion as a general language resource and as tailored for domain-specific applications. Furthermore, I propose and carry out several methods for evaluating semantic lexical resources. In addition to the English Semantic Tagger, which was developed first, and the Finnish Semantic Tagger second, equivalent semantic taggers have now been developed for Czech, Chinese, Dutch, French, Italian, Malay, Portuguese, Russian, Spanish, Urdu, and Welsh. All these semantic taggers taken together form a program framework called the UCREL Semantic Analysis System (USAS) which enables the development of not only monolingual but also various types of multilingual applications. Large-scale semantic lexical resources designed for Finnish using semantic fields as the organizing principle have not been attempted previously. Thus, the Finnish semantic lexicons created in this thesis are a unique and novel resource. The lexical coverage on the test corpora containing general modern standard Finnish, which has been the focus of the lexicon development, ranges from 94.58% to 97.91%. However, the results are also very promising in the analysis of domain-specific text (95.36%), older Finnish text (92.11–93.05%), and Internet discussions (91.97–94.14%). The results of the evaluation of lexical coverage are comparable to the results obtained with the English equivalents and thus indicate that the Finnish semantic lexical resources indeed cover the majority of core Finnish vocabulary

    Room for chaos? : authenticity and performance in undergraduate spatial design students’ accounts of ideational work

    Get PDF
    A thesis submitted to the University of Bedfordshire, in fulfilment of the requirements for the degree of Professional DoctorateThis study was prompted by my suspicion that spatial design undergraduates’ production of paper-based freehand sketches during design ideation was in decline. Seeking to find out why, I conducted video-recorded focused interviews with undergraduates from a range of UK spatial design degrees, during which we examined their sketchbook material and discussed their ideational activities (termed ‘ideational moves’). I subjected the data to a form of content analysis, but the outcomes appeared to contradict my initial premise whilst revealing that the interactions during the interviews between myself, the respondents and the sketchbook material (termed ‘discursive moves’) warranted examination. This persuaded me that the study’s focus should emerge through ‘evolved’ grounded theory rather than being stated a priori, which highlighted my presence in, and impact on, the data and prompted me to adopt a constructivist grounded theorising approach in combination with actor-network theory’s concepts of translation and circulating references. This study has thus been qualitative, relativist, iterative and multi-modal. Grounded theorising led to the identification of a number of categories and sub-categories of ideational move across the sample, and indicated that the respondents had used a ‘core’ of each. ‘Core’ categories comprised: making paper-based ideational moves, carrying out research and using photographic material. Several respondents also evidenced producing digital imagery and physical models. ‘Core’ sub-categories comprised using paper-based freehand perspective sketches, sketch diagrams and word-based approaches, plus supporting visuo-spatial research. Several respondents also evidenced producing paper-based freehand plan, section and elevation sketches, plus collage. Grounded theorising also revealed that each respondent had utilised a different combination of sub-categories, with different degrees of connectedness. I did not set out to evaluate the design outcomes showcased, but, as a spatial design academic and practitioner, I felt compelled to. This led to the tentative conclusion that respondents who added to the ‘core’ of categories and sub-categories and worked with greater connectedness appeared to produce more thoroughly-considered work, whilst those who forsook the ‘core’ and worked with less connectedness appeared to produce more unexpected results by allowing ‘…room for chaos…’: periods of confusion and surprise. Regarding the discursive moves, grounded theorising indicated that the sketchbook material tabled by each respondent during the study was not one fixed thing, but an abstraction using placing-for and directing-to techniques to focus attention on certain ideational moves and away from others. This made the sketchbook material a performance within the network of human and non-human actors who, in effect, co-constructed it as a temporary reality without necessarily realising this. Research into sketchbook material appears to regard it, once shared with others, as having the candour of a secret diary, and as eligible for formative and summative assessment because it documents design process authentically. My study, whilst not claiming generalisability, suggests that this view should be challenged. The new knowledge is now informing my future teaching practice and will, I hope, prompt other academics to investigate whether their own students manifest similar outcomes and, through this, contribute to wider discussions on the formative and summative assessment of undergraduate spatial design development activity

    Social work with airports passengers

    Get PDF
    Social work at the airport is in to offer to passengers social services. The main methodological position is that people are under stress, which characterized by a particular set of characteristics in appearance and behavior. In such circumstances passenger attracts in his actions some attention. Only person whom he trusts can help him with the documents or psychologically

    FINE-GRAINED EMOTION DETECTION IN MICROBLOG TEXT

    Get PDF
    Automatic emotion detection in text is concerned with using natural language processing techniques to recognize emotions expressed in written discourse. Endowing computers with the ability to recognize emotions in a particular kind of text, microblogs, has important applications in sentiment analysis and affective computing. In order to build computational models that can recognize the emotions represented in tweets we need to identify a set of suitable emotion categories. Prior work has mainly focused on building computational models for only a small set of six basic emotions (happiness, sadness, fear, anger, disgust, and surprise). This thesis describes a taxonomy of 28 emotion categories, an expansion of these six basic emotions, developed inductively from data. This set of 28 emotion categories represents a set of fine-grained emotion categories that are representative of the range of emotions expressed in tweets, microblog posts on Twitter. The ability of humans to recognize these fine-grained emotion categories is characterized using inter-annotator reliability measures based on annotations provided by expert and novice annotators. A set of 15,553 human-annotated tweets form a gold standard corpus, EmoTweet-28. For each emotion category, we have extracted a set of linguistic cues (i.e., punctuation marks, emoticons, emojis, abbreviated forms, interjections, lemmas, hashtags and collocations) that can serve as salient indicators for that emotion category. We evaluated the performance of automatic classification techniques on the set of 28 emotion categories through a series of experiments using several classifier and feature combinations. Our results shows that it is feasible to extend machine learning classification to fine-grained emotion detection in tweets (i.e., as many as 28 emotion categories) with results that are comparable to state-of-the-art classifiers that detect six to eight basic emotions in text. Classifiers using features extracted from the linguistic cues associated with each category equal or better the performance of conventional corpus-based and lexicon-based features for fine-grained emotion classification. This thesis makes an important theoretical contribution in the development of a taxonomy of emotion in text. In addition, this research also makes several practical contributions, particularly in the creation of language resources (i.e., corpus and lexicon) and machine learning models for fine-grained emotion detection in text

    The development of a framework for semantic similarity measures for the Arabic language

    Get PDF
    This thesis presents a novel framework for developing an Arabic Short Text Semantic Similarity (STSS) measure, namely that of NasTa. STSS measures are developed for short texts of 10 -25 words long. The algorithm calculates the STSS based on Part of Speech (POS), Arabic Word Sense Disambiguation (WSD), semantic nets and corpus statistics. The proposed framework is founded on word similarity measures. Firstly, a novel Arabic noun similarity measure is created using information sources extracted from a lexical database known as Arabic WordNet. Secondly, a novel verb similarity algorithm is created based on the assumption that words sharing a common root usually have a related meaning which is a central characteristic of Arabic language. Two Arabic word benchmark datasets, noun and verb are created to evaluate them. These are the first of their kinds for Arabic. Their creation methodologies use the best available experimental techniques to create materials and collect human ratings from representative samples of the Arabic speaking population. Experimental evaluation indicates that the Arabic noun and the Arabic verb measures performed well and achieved good correlations comparison with the average human performance on the noun and verb benchmark datasets respectively. Specific features of the Arabic language are addressed. A new Arabic WSD algorithm is created to address the challenge of ambiguity caused by missing diacritics in the contemporary Arabic writing system. The algorithm disambiguates all words (nouns and verbs) in the Arabic short texts without requiring any manual training data. Moreover, a novel algorithm is presented to identify the similarity score between two words belonging to different POS, either a pair comprising a noun and verb or a verb and noun. This algorithm is developed to perform Arabic WSD based on the concept of noun semantic similarity. Important benchmark datasets for text similarity are presented: ASTSS-68 and ASTSS-21. Experimental results indicate that the performance of the Arabic STSS algorithm achieved a good correlation comparison with the average human performance on ASTSS-68 which was statistically significant

    FROM SEMANTIC TO EMOTIONAL SPACE IN SENSE SENTIMENT ANALYSIS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    An analysis and comparison of predominant word sense disambiguation algorithms

    Get PDF
    This thesis investigates research performed in the area of natural language processing. It is the aim of this research to compare a selection of predominant word sense disambiguation algorithms, and also determine if they can be optimised by small changes to the parameters used by the algorithms. To perform this research, several word sense disambiguation algorithms will be implemented in Java, and run on a range of test corpora. The algorithms will be judged on metrics such as speed and accuracy, and any other results obtained; while an algorithm may be fast and accurate, there may be other factors making it less desirable. Finally, to demonstrate the purpose and usefulness of using better algorithms, the algorithms will be used in conjunction with a real world application. Five algorithms were used in this research: The standard Lesk algorithm, the simplified Lesk algorithm, a Lesk algorithm variant using hypernyms, a Lesk algorithm variant using synonyms, and a baseline performance algorithm. While the baseline algorithm should have been less accurate than the other algorithms, testing found that it could disambiguate words more accurately than any of the other algorithms, seemingly because the baseline makes use of statistical data in WordNet, the machine readable dictionary used for testing; data unable to be used by the other algorithms. However, with a few modifications, the Simplified Lesk algorithm was able to reach performance just a few percent lower than that of the baseline algorithm. It is the aim of this research to apply word sense disambiguation to automatic concept mapping, to determine if more accurate algorithms are able to display noticeably better results in a real world application. It was found in testing, that the overall accuracy of the algorithm had little effect on the quality of concept maps produced, but rather depended on the text being examined

    Meaning construction in popular science : an investigation into cognitive, digital, and empirical approaches to discourse reification

    Get PDF
    This thesis uses cognitive linguistics and digital humanities techniques to analyse abstract conceptualization in a corpus of popular science texts. Combining techniques from Conceptual Integration Theory, corpus linguistics, data-mining, cognitive pragmatics and computational linguistics, it presents a unified approach to understanding cross-domain mappings in this area, and through case studies of key extracts, describes how concept integration in these texts operates. In more detail, Part I of the thesis describes and implements a comprehensive procedure for semantically analysing large bodies of text using the recently- completed database of the Historical Thesaurus of English. Using log-likelihood statistical measures and semantic annotation techniques on a 600,000 word corpus of abstract popular science, this part establishes both the existence and the extent of significant analogical content in the corpus. Part II then identifies samples which are particularly high in analogical content from the corpus, and proposes an adaptation of empirical and corpus methods to support and enhance conceptual integration (sometimes called conceptual blending) analyses, informed by Part I’s methodologies for the study of analogy on a wider scale. Finally, the thesis closes with a detailed analysis, using this methodology, of examples taken from the example corpus. This analysis illustrates those conclusions which can be drawn from such work, completing the methodological chain of reasoning from wide-scale corpora to narrow-focus semantics, and providing data about the nature of highly-abstract popular science as a genre. The thesis’ original contribution to knowledge is therefore twofold; while contributing to the understanding of the reification of abstractions in discourse, it also focuses on methodological enhancements to existing tools and approaches, aiming to contribute to the established tradition of both analytic and procedural work advancing the digital humanities in the area of language and discourse
    corecore