Search CORE

6 research outputs found

PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically

Author: Alikhani Malihe
Feng Steven Y.
Gangal Varun
Hovy Eduard
Keh Sedrick Scott
Publication venue
Publication date: 14/02/2023
Field of study

Tongue twisters are meaningful sentences that are difficult to pronounce. The process of automatically generating tongue twisters is challenging since the generated utterance must satisfy two conditions at once: phonetic difficulty and semantic meaning. Furthermore, phonetic difficulty is itself hard to characterize and is expressed in natural tongue twisters through a heterogeneous mix of phenomena such as alliteration and homophony. In this paper, we propose PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically. We leverage phoneme representations to capture the notion of phonetic difficulty, and we train language models to generate original tongue twisters on two proposed task settings. To do this, we curate a dataset called PANCETTA, consisting of existing English tongue twisters. Through automatic and human evaluation, as well as qualitative analysis, we show that PANCETTA generates novel, phonetically difficult, fluent, and semantically meaningful tongue twisters.Comment: EACL 2023. Code at https://github.com/sedrickkeh/PANCETT

arXiv.org e-Print Archive

EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation

Author: Emmy Liu
Roberto Navigli
Rohit K. Bharadwaj
Sedrick Scott Keh
Simone Tedeschi
Varun Gangal
Publication venue
Publication date: 01/01/2022
Field of study

We introduce EUREKA, an ensemble-based approach for performing automatic euphemism detection. We (1) identify and correct potentially mislabelled rows in the dataset, (2) curate an expanded corpus called EuphAug, (3) leverage model representations of Potentially Euphemistic Terms (PETs), and (4) explore using representations of semantically close sentences to aid in classification. Using our augmented dataset and kNN-based methods, EUREKA was able to achieve state-of-the-art results on the public leaderboard of the Euphemism Detection Shared Task, ranking first with a macro F1 score of 0.881

Archivio della ricerca- Università di Roma La Sapienza

PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation

Author: Alikhani Malihe
Feng Steven Y.
Gangal Varun
Hovy Eduard
Jhamtani Harsh
Keh Sedrick Scott
Lu Kevin
Publication venue
Publication date: 16/09/2022
Field of study

A personification is a figure of speech that endows inanimate entities with properties and actions typically seen as requiring animacy. In this paper, we explore the task of personification generation. To this end, we propose PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation. We curate a corpus of personifications called PersonifCorp, together with automatically generated de-personified literalizations of these personifications. We demonstrate the usefulness of this parallel corpus by training a seq2seq model to personify a given literal input. Both automatic and human evaluations show that fine-tuning with PersonifCorp leads to significant gains in personification-related qualities such as animacy and interestingness. A detailed qualitative analysis also highlights key strengths and imperfections of PINEAPPLE over baselines, demonstrating a strong ability to generate diverse and creative personifications that enhance the overall appeal of a sentence.Comment: Accepted to COLING 2022; official Github repo at https://github.com/sedrickkeh/PINEAPPL

arXiv.org e-Print Archive

NewsPanda: Media Monitoring for Timely Conservation Action

Author: Bhagabati Nirmal
Dewan Karun
Fang Fei
Gopala Areendran
Izquierdo Pablo
Keh Sedrick Scott
Mallick Debojyoti
Patterson David J.
Sharma Ambika
Shi Zheyuan Ryan
Shrestha Pooja
Publication venue
Publication date: 30/04/2023
Field of study

Non-governmental organizations for environmental conservation have a significant interest in monitoring conservation-related media and getting timely updates about infrastructure construction projects as they may cause massive impact to key conservation areas. Such monitoring, however, is difficult and time-consuming. We introduce NewsPanda, a toolkit which automatically detects and analyzes online articles related to environmental conservation and infrastructure construction. We fine-tune a BERT-based model using active learning methods and noise correction algorithms to identify articles that are relevant to conservation and infrastructure construction. For the identified articles, we perform further analysis, extracting keywords and finding potentially related sources. NewsPanda has been successfully deployed by the World Wide Fund for Nature teams in the UK, India, and Nepal since February 2022. It currently monitors over 80,000 websites and 1,074 conservation sites across India and Nepal, saving more than 30 hours of human efforts weekly. We have now scaled it up to cover 60,000 conservation sites globally.Comment: Accepted to IAAI-23: 35th Annual Conference on Innovative Applications of Artificial Intelligence. Winner of IAAI Deployed Application Award. Code at https://github.com/NewsPanda-WWF-CMU/weekly-pipelin

arXiv.org e-Print Archive

NewsPanda: Media Monitoring for Timely Conservation Action

Author: Bhagabati Nirmal
Dewan Karun
Fang Fei
Gopala Areendran
Izquierdo Pablo
Keh Sedrick Scott
Mallick Debojyoti
Patterson David J.
Sharma Ambika
Shi Zheyuan Ryan
Shrestha Pooja
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 06/09/2023
Field of study

Association for the Advancement of Artificial Intelligence: AAAI Publications