46 research outputs found

    Misogyny Detection in Social Media on the Twitter Platform

    Get PDF
    The thesis is devoted to the problem of misogyny detection in social media. In the work we analyse the difference between all offensive language and misogyny language in social media, and review the best existing approaches to detect offensive and misogynistic language, which are based on classical machine learning and neural networks. We also review recent shared tasks aimed to detect misogyny in social media, several of which we have participated in. We propose an approach to the detection and classification of misogyny in texts, based on the construction of an ensemble of models of classical machine learning: Logistic Regression, Naive Bayes, Support Vectors Machines. Also, at the preprocessing stage we used some linguistic features, and novel approaches which allow us to improve the quality of classification. We tested the model on the real datasets both English and multilingual corpora. The results we achieved with our model are highly competitive in this area and demonstrate the capability for future improvement

    Designing Service-Oriented Chatbot Systems Using a Construction Grammar-Driven Natural Language Generation System

    Get PDF
    Service oriented chatbot systems are used to inform users in a conversational manner about a particular service or product on a website. Our research shows that current systems are time consuming to build and not very accurate or satisfying to users. We find that natural language understanding and natural language generation methods are central to creating an e�fficient and useful system. In this thesis we investigate current and past methods in this research area and place particular emphasis on Construction Grammar and its computational implementation. Our research shows that users have strong emotive reactions to how these systems behave, so we also investigate the human computer interaction component. We present three systems (KIA, John and KIA2), and carry out extensive user tests on all of them, as well as comparative tests. KIA is built using existing methods, John is built with the user in mind and KIA2 is built using the construction grammar method. We found that the construction grammar approach performs well in service oriented chatbots systems, and that users preferred it over other systems

    Use of linguistic markers in the identification and analysis of chief executives’ hubris

    Get PDF
    This research seeks to provide an insight into the identification and understanding of linguistic markers of Chief Executive Officer (CEO) hubris. It analyses spoken and written discourse samples of CEOs deemed to be hubristic and benchmarks the results against those of a sample of non-hubristic CEOs. In doing so it explores the hypothesis that the linguistic utterances of hubristic CEOs show consistent differences from the language produced by CEOs who have not been identified as possessing hubristic tendencies. This thesis presents a review of academic literatures pertaining to personality, hubris and natural language use. The review of these three domains leads to the conclusion that certain personality traits are antecedents to hubris and can be identified in one’s language use. The word count strategies are reviewed in depth as a framework for measuring hubris at-a-distance through the assessment of CEO’s linguistic utterances. In addition to word count strategies, this thesis also proposes a new approach – applying machine learning techniques to the analysis of language - for identifying CEO hubris. The research consists of a Pilot Study and three main studies (Study 1, Study 2, Study 3 (comprising two sub-studies, 3a and 3b)). It describes in detail the process, methods and materials used, summarises findings and explains the implications of the results obtained for further research into linguistic markers of hubris. For the purpose of this research, Hubris Syndrome is conceptualised as proposed by Owen and Davidson (2009), including all 14 proposed symptoms for Hubris Syndrome (Owen & Davidson, 2009). This research focuses explicitly on leaders who occupy or have occupied the position of CEO for a significant amount of time and were identified by other researchers, subject matter experts and media as having exhibited the features of Hubris Syndrome during their time in office. This research proposes several innovative techniques to identify the differences between hubristic and non-hubristic language, and documents subtle differences identified. Findings from this doctoral study suggest that the high use of impersonal pronouns, the total count of pronouns, auxiliary verbs, common verbs and tentative tone indicate CEO hubris. All in all, exploring if and how hubris’ symptoms manifests in CEO language use and what are characteristic features of hubristic discourse, contributes to wider research regarding the diagnosis and prevention of this phenomenon. This study seeks to mitigate the risk of potentially destructive CEO behaviour for the organisation and prevent organisational failures induced or aggravated by Hubris Syndrome

    A model for mobile, context-aware in-car communication systems to reduce driver distractions

    Get PDF
    Driver distraction remains a matter of concern throughout the world as the number of car accidents caused by distracted driving is still unacceptably high. Industry and academia are working intensively to design new techniques that will address all types of driver distraction including visual, manual, auditory and cognitive distraction. This research focuses on an existing technology, namely in-car communication systems (ICCS). ICCS allow drivers to interact with their mobile phones without touching or looking at them. Previous research suggests that ICCS have reduced visual and manual distraction. Two problems were identified in this research: existing ICCS are still expensive and only available in limited models of car. As a result of that, only a small number of drivers can obtain a car equipped with an ICCS, especially in developing countries. The second problem is that existing ICCS are not aware of the driving context, which plays a role in distracting drivers. This research project was based on the following thesis statement: A mobile, context-aware model can be designed to reduce driver distraction caused by the use of ICCS. A mobile ICCS is portable and can be used in any car, addressing the first problem. Context-awareness will be used to detect possible situations that contribute to distracting drivers and the interaction with the mobile ICCS will be adapted so as to avert calls and text messages. This will address the second problem. As the driving context is dynamic, drivers may have to deal with critical safety-related tasks while they are using an existing ICCS. The following steps were taken in order to validate the thesis statement. An investigation was conducted into the causes and consequences of driver distraction. A review of literature was conducted on context-aware techniques that could potentially be used. The design of a model was proposed, called the Multimodal Interface for Mobile Info-communication with Context (MIMIC) and a preliminary usability evaluation was conducted in order to assess the feasibility of a speech-based, mobile ICCS. Despite some problems with the speech recognition, the results were satisfying and showed that the proposed model for mobile ICCS was feasible. Experiments were conducted in order to collect data to perform supervised learning to determine the driving context. The aim was to select the most effective machine learning techniques to determine the driving context. Decision tree and instance-based algorithms were found to be the best performing algorithms. Variables such as speed, acceleration and linear acceleration were found to be the most important variables according to an analysis of the decision tree. The initial MIMIC model was updated to include several adaptation effects and the resulting model was implemented as a prototype mobile application, called MIMIC-Prototype

    Automatic correction of grammatical errors in non-native English text

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 99-107).Learning a foreign language requires much practice outside of the classroom. Computer-assisted language learning systems can help fill this need, and one desirable capability of such systems is the automatic correction of grammatical errors in texts written by non-native speakers. This dissertation concerns the correction of non-native grammatical errors in English text, and the closely related task of generating test items for language learning, using a combination of statistical and linguistic methods. We show that syntactic analysis enables extraction of more salient features. We address issues concerning robustness in feature extraction from non-native texts; and also design a framework for simultaneous correction of multiple error types. Our proposed methods are applied on some of the most common usage errors, including prepositions, verb forms, and articles. The methods are evaluated on sentences with synthetic and real errors, and in both restricted and open domains. A secondary theme of this dissertation is that of user customization. We perform a detailed analysis on a non-native corpus, illustrating the utility of an error model based on the mother tongue. We study the benefits of adjusting the correction models based on the quality of the input text; and also present novel methods to generate high-quality multiple-choice items that are tailored to the interests of the user.by John Sie Yuen Lee.Ph.D

    Community based Question Answer Detection

    Get PDF
    Each day, millions of people ask questions and search for answers on the World Wide Web. Due to this, the Internet has grown to a world wide database of questions and answers, accessible to almost everyone. Since this database is so huge, it is hard to find out whether a question has been answered or even asked before. As a consequence, users are asking the same questions again and again, producing a vicious circle of new content which hides the important information. One platform for questions and answers are Web forums, also known as discussion boards. They present discussions as item streams where each item contains the contribution of one author. These contributions contain questions and answers in human readable form. People use search engines to search for information on such platforms. However, current search engines are neither optimized to highlight individual questions and answers nor to show which questions are asked often and which ones are already answered. In order to close this gap, this thesis introduces the \\emph{Effingo} system. The Effingo system is intended to extract forums from around the Web and find question and answer items. It also needs to link equal questions and aggregate associated answers. That way it is possible to find out whether a question has been asked before and whether it has already been answered. Based on these information it is possible to derive the most urgent questions from the system, to determine which ones are new and which ones are discussed and answered frequently. As a result, users are prevented from creating useless discussions, thus reducing the server load and information overload for further searches. The first research area explored by this thesis is forum data extraction. The results from this area are intended be used to create a database of forum posts as large as possible. Furthermore, it uses question-answer detection in order to find out which forum items are questions and which ones are answers and, finally, topic detection to aggregate questions on the same topic as well as discover duplicate answers. These areas are either extended by Effingo, using forum specific features such as the user graph, forum item relations and forum link structure, or adapted as a means to cope with the specific problems created by user generated content. Such problems arise from poorly written and very short texts as well as from hidden or distributed information

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

    TME Volume 12, Numbers 1, 2, and 3

    Get PDF
    corecore