1,210 research outputs found
Recommended from our members
Data Scarcity in Event Analysis and Abusive Language Detection
Lack of data is almost always the cause of the suboptimal performance of neural networks. Even though data scarce scenarios can be simulated for any task by assuming limited access to training data, we study two problem areas where data scarcity is a practical challenge: event analysis and abusive content detection} Journalists, social scientists and political scientists need to retrieve and analyze event mentions in unstructured text to compute useful statistical information to understand society. We claim that it is hard to specify information need about events using keyword-based representation and propose a Query by Example (QBE) setting for event retrieval. In the QBE setting, we assume that there are a few example sentences mentioning the event class a user is interested in and we aim to retrieve relevant events using only the examples as a query. Traditional event detection approaches are not applicable in this setting as event detection datasets are constructed based on pre-defined schemas which limits them to a small set of event and event-argument types. Moreover, the amount of annotated data in event detection datasets is limited that only allows us to build a retrieval corpus for evaluation. Thus we assume that there are no relevance judgments to train an event retrieval model -- except for the few examples of a specific event type. We create three QBE evaluation settings from three event detection datasets: PoliceKilling, ACE, and IndiaPoliceEvents. For the PoliceKilling dataset, where a relevant sentence describes a police killing event, we show that a query model constructed from the NLP features extracted from the few given examples is effective compared to event detection baselines. For the ACE dataset, where there are thirty-three types of events, we construct a QBE setting for each type and show that a sentence embedding approach effectively transfers for event matching. Finally, we conducted a unified evaluation of all three datasets using the sentence-embedding-based model and showed that it outperforms strong baselines.
We further examine the effect of data scarcity in abusive language detection. We first study a specific type of abusive language -- hate speech. Neural hate speech detection models trained from one dataset poorly generalize to another dataset from a different domain. This is because characteristics of hate speech vary based on racial and cultural aspects. Our data scarcity scenario assumes that we have a hate speech dataset from a domain and it needs to generalize to a test set from another domain using the unlabeled data from the test domain only. Thus we assume zero target domain data in this scenario. To tackle the data scarcity, we propose an unsupervised domain adaptation approach to augment labeled data for hate speech detection. We evaluate the approach with three different models (character CNNs, BiLSTMs, and BERT) on three different collections. We show our approach improves Area under the Precision/Recall curve by as much as 42% and recall by as much as 278%, with no loss (and in some cases a significant gain) in precision.
Finally, we examine the cross-lingual abusive language detection problem. Abusive language is a superclass of hate speech that includes profanity, aggression, offensiveness, cyberbullying, toxicity, and hate speech itself. There is a large collection of abusive language detection datasets in English such as Jigsaw. For other languages there exist datasets for abusive language detection but with very limited data. We propose a cross-lingual transfer learning approach to learn an effective neural abusive language classifier for such low-resource languages with help from a dataset from a resource-rich language. The framework is based on a nearest-neighbor architecture and is thus interpretable by design. It is a modern instantiation of the classic k-nearest neighbor model, as we use transformer representations in all its components. Unlike prior work on neighborhood-based approaches, we encode the neighborhood information based on query-neighbor interactions. We propose two encoding schemes and show their effectiveness using both qualitative and quantitative analyses. Our evaluation results on eight languages from two different datasets for abusive language detection show sizable improvements in F1 over strong baselines
CONNECTING THE DOTS OF AN OPAQUE CRIME: ANALYZING THE INFORMATION-SHARING FRAMEWORK AND PRACTICES OF CALIFORNIA’S HUMAN-TRAFFICKING TASK FORCES
This thesis explores existing frameworks and common challenges with information sharing among California’s anti–human trafficking specialty units. This research aimed to contextualize current gaps and barriers in the collection and dissemination process of sensitive and confidential human-trafficking information. The research identified social, economic, and human interpersonal factors affecting group work and illustrated how a nuanced application of the social identity analytical method might decrease interpersonal misunderstandings and miscommunications, thus increasing the volume and quantity of anti-trafficking information sharing. The findings of this research indicate that when anti-trafficking specialty units do not work together seamlessly, they foster programmatic and societal shadows that traffickers rely on to exploit their victims. Gaining an in-depth perspective on working group members’ social identities will increase trust within the groups, thereby promoting cooperation, coordination, and collaboration. Elevating all forms of group work is likely to spur analytical insights into the evolving tactics, techniques, and procedures of the threat actors, not to mention identify previously unrecognized victims while building more successful prosecutions.Civilian, California Governor's Office of Emergency Services (CalOES)Approved for public release. Distribution is unlimited
Challenges and perspectives of hate speech research
This book is the result of a conference that could not take place. It is a collection of 26 texts that address and discuss the latest developments in international hate speech research from a wide range of disciplinary perspectives. This includes case studies from Brazil, Lebanon, Poland, Nigeria, and India, theoretical introductions to the concepts of hate speech, dangerous speech, incivility, toxicity, extreme speech, and dark participation, as well as reflections on methodological challenges such as scraping, annotation, datafication, implicity, explainability, and machine learning. As such, it provides a much-needed forum for cross-national and cross-disciplinary conversations in what is currently a very vibrant field of research
Cyber Security Politics
This book examines new and challenging political aspects of cyber security and presents it as an issue defined by socio-technological uncertainty and political fragmentation. Structured along two broad themes and providing empirical examples for how socio-technical changes and political responses interact, the first part of the book looks at the current use of cyber space in conflictual settings, while the second focuses on political responses by state and non-state actors in an environment defined by uncertainties. Within this, it highlights four key debates that encapsulate the complexities and paradoxes of cyber security politics from a Western perspective – how much political influence states can achieve via cyber operations and what context factors condition the (limited) strategic utility of such operations; the role of emerging digital technologies and how the dynamics of the tech innovation process reinforce the fragmentation of the governance space; how states attempt to uphold stability in cyberspace and, more generally, in their strategic relations; and how the shared responsibility of state, economy, and society for cyber security continues to be re-negotiated in an increasingly trans-sectoral and transnational governance space. This book will be of much interest to students of cyber security, global governance, technology studies, and international relations
Critical dialogues: slow readings of English literary texts
The reader will find gathered together in this volume a selection of articles and essays that have been separately published over the past three decades as a result of my teaching practice and research activity. In chronological terms, they span a period from 1985 until May 2008. These texts were chosen because they are considered to be representative of the type of work I usually do in my classes, undoubtedly as a result of my own academic background, based on a deliberate and specific form of approach to literary texts, my own choice of English authors and, lastly, my most recent interest in inter-art studies.Fundação para a Ciência e a Tecnologi
Cyber Security Politics
This book examines new and challenging political aspects of cyber security and presents it as an issue defined by socio-technological uncertainty and political fragmentation. Structured along two broad themes and providing empirical examples for how socio-technical changes and political responses interact, the first part of the book looks at the current use of cyber space in conflictual settings, while the second focuses on political responses by state and non-state actors in an environment defined by uncertainties. Within this, it highlights four key debates that encapsulate the complexities and paradoxes of cyber security politics from a Western perspective – how much political influence states can achieve via cyber operations and what context factors condition the (limited) strategic utility of such operations; the role of emerging digital technologies and how the dynamics of the tech innovation process reinforce the fragmentation of the governance space; how states attempt to uphold stability in cyberspace and, more generally, in their strategic relations; and how the shared responsibility of state, economy, and society for cyber security continues to be re-negotiated in an increasingly trans-sectoral and transnational governance space. This book will be of much interest to students of cyber security, global governance, technology studies, and international relations
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
- …