6 research outputs found

    Visualizing Incongruity: Visual Data Mining Strategies for Modeling Humor in Text

    Get PDF
    The goal of this project is to investigate the use of visual data mining to model verbal humor. We explored various means of text visualization to identify key featrues of garden path jokes as compared with non jokes. With garden path jokes one interpretation is established in the setup but new information indicating some alternative interpretation triggers some resolution process leading to a new interpretation. For this project we visualize text in three novel ways, assisted by some web mining to build an informal ontology, that allow us to see the differences between garden path jokes and non jokes of similar form. We used the results of the visualizations to build a rule based model which was then compared with models from tradtitional data mining toi show the use of visual data mining. Additional experiments with other forms of incongruity including visualization of ’shilling’ or the introduction of false reviews into a product review set. The results are very similar to that of garden path jokes and start to show us there is a shape to incongruity. Overall this project shows as that the proposed methodologies and tools offer a new approach to testing and generating hypotheses related to theories of humor as well as other phenomena involving opposition, incongruities, and shifts in classification

    Have media texts become more humorous?

    Get PDF
    As a research topic, humour has drawn much attention from multiple disciplines including linguistics. Based on Engelthaler & Hills’ (2018) humour scale, this study developed a measure named Humour Index (HMI) to quantify the degree of humour of texts. This measure was applied to examine the diachronic changes in the degree of humour of American newspapers and magazines across a time span of 118 years (1900-2017) with the use of texts from Corpus of Historical American English (COHA). Besides, the study also discussed the contributions of different types of words to the degree of humour in the two genres. The results show significant uptrends in the degree of humour of both newspapers and magazines in the examined period. Moreover, derogatory and offensive words are found to be less frequently used than other categories of words in both genres. This study provides both theoretical and methodological implications for humour studies and claims or hypotheses of previous research, such as infotainment and linguistic positivity bias

    Automatic Humor Evaluation

    Get PDF
    Cílem této práce je vytvoření systému pro automatické hodnocení humoru. Systém umožňuje predikovat vtipnost a kategorii pro vstup zadaný v angličtině. Hlavní podstatou je vytvoření klasifikátoru a trénování modelu na vytvořených datových sadách pro získání co nejlepších výsledků. Architektura klasifikátoru je založena na neuronových sítích. Systém zároveň obsahuje webové uživatelské rozhraní pro komunikaci s uživatelem. Výsledek je webová aplikace propojená s klasifikátorem umožňující hodnocení uživatelského vstupu a poskytování zpětné vazby od uživatelů.The aim of this thesis is to create a system for automatic humor evaluation. The system allow to predict humor and category for english input. The main essence is to create a classifier and train the model with the created datasets to get the best possible results. The classifier architecture is based on neural networks. The system also includes a web user interface for communication with the user. The result is a web application linked to a classifier that allows user input to be evaluated and user feedback to be provided.

    Demographic-Aware Natural Language Processing

    Full text link
    The underlying traits of our demographic group affect and shape our thoughts, and therefore surface in the way we express ourselves and employ language in our day-to-day life. Understanding and analyzing language use in people from different demographic backgrounds help uncover their demographic particularities. Conversely, leveraging these differences could lead to the development of better language representations, thus enabling further demographic-focused refinements in natural language processing (NLP) tasks. In this thesis, I employ methods rooted in computational linguistics to better understand various demographic groups through their language use. The thesis makes two main contributions. First, it provides empirical evidence that words are indeed used differently by different demographic groups in naturally occurring text. Through experiments conducted on large datasets which display usage scenarios for hundreds of frequent words, I show that automatic classification methods can be effective in distinguishing between word usages of different demographic groups. I compare the encoding ability of the utilized features by conducting feature analyses, and shed light on how various attributes contribute to highlighting the differences. Second, the thesis explores whether demographic differences in word usage by different groups can inform the development of more refined approaches to NLP tasks. Specifically, I start by investigating the task of word association prediction. The thesis shows that going beyond the traditional ``one-size-fits-all'' approach, demographic-aware models achieve better performances in predicting word associations for different demographic groups than generic ones. Next, I investigate the impact of demographic information on part-of-speech tagging and syntactic parsing, and the experiments reveal numerous part-of-speech tags and syntactic relations, whose predictions benefit from the prevalence of a specific group in the training data. Finally, I explore demographic-specific humor generation, and develop a humor generation framework to fill-in the blanks to generate funny stories, while taking into account people's demographic backgrounds.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155164/1/gaparna_1.pd

    PROSIDING SEMINAR TAHUNAN LINGUISTIK UNIVERSITAS PENDIDIKAN INDONESIA (SETALI 2016) TINGKAT INTERNASIONAL: Analisis Bahasa dari Sudut Pandang Linguistik Forensik

    Get PDF
    “Bahasa bisa dipakai untuk menyembunyikan pikiran”- sebuah pernyataan yang menarik untuk dikaji lebih jauh. Hal tersebut terutama dirasa sangat relevan dilakukan di dunia penegakan hukum. Dalam konteks ini kajian linguistik, khususnya Linguistik Forensik, berpotensi berkontribusi terhadap upaya pencarian dan pengungkapan informasi sahih tentang suatu kasus pelanggaran hukum melalui serangkaian langkah sistematis analisis data bahasa (corpus) yang relevan. Diharapkan, dengan mengoptimalkan pengkajian berbagai moda yang ada, bahasa salah satunya, kualitas penegakan hukum meningkat dan keadilan bisa lebih terkawal untuk ditegakkan. Saat ini ditemukan sejumlah fenomena menarik yang terjadi di dunia penegakan hukum, khususnya di Unit Reskrim di wilayah hukum Polda Jabar sekaitan dengan penyidikan tindak pidana berbarang bukti data kebahasaan seperti: (1) maraknya modus kejahatan dan tindak pidana baru yang berbarang bukti data kebahasaan dan (2) penyidik mengalami kesulitan ketika menyusun kasus posisi perkara pidana penghinaan, pencemaran nama baik, fitnah, dan pemalsuan sebab kriteria terpenuhinya unsur pidana ini, secara kebahasaan, tidak diatur dalam pasal 310, 311, dan 335 KUHAP serta Pasal 27 ayat 3 UU ITE sebagai sumber hukum yang mengatur tindak perkara pidana ini. Kondisi seperti itu menuntut pendekatan dan aplikasi ilmu pengetahuan modern (dalam hal ini linguistik forensik) yang secara aksiologis mampu menguraikan perkara pidana berbarang bukti data kebahasaan secara tuntas. Untuk itu, Program Studi Linguistik SPs UPI bekerjasama dengan organisasi profesi Masyarakat Linguistik Indonesia (MLI) dan Fakultas Pendidikan Bahasa dan Sastra (FPBS) UPI kembali menggelar Seminar Tahunan Linguistik (SETALI) yang ke-4 dengan mengambil tema Linguistik Forensik untuk Keadilan. Kegiatan tersebut diarahkan untuk menyediakan ruang bagi para peminat kajian bahasa yang akan mendiseminasikan pemikiran dan temuan terkait dengan hasil penelitiannya. Ada 3 kegiatan utama dalam acara SETALI kali ini: Pra-SETALI Senin dan Selasa, 30 - 31 Mei 2016, berbentuk workshop dengan tema Analisis Bahasa dari Sudut Pandang Analisis Forensik, SETALI Rabu dan Kamis, 01- 02 Juni 2016, dengan tema Linguistik Forensik untuk Keadilan, dan Pasca-SETALI Juma‟at, 03 Juni 2016, berbentuk Public Lecture untuk para peneliti, pengamat, pengajar, dan mahasiswa bahasa dengan tema Towards Clearer Jury Instruction. Pada kesempatan yang baik ini, kami mengucapkan terimaksih kepada berbagai fihak, khususnya kepada Anda semua para peserta SETALI. Tanpa dukungan, kehadiran dan partisipasi Anda dan izin Yang Mahakuasa, tidak akan ada SETALI. Akhirul kalam, selamat berdiskusi dan berbagi ilmu serta pengalaman. Bumi Siliwangi, 27 Juni 2016 Penanggung Jawab, Dr. Dadang Sudana, M.A
    corecore