11 research outputs found
Novel Intent Detection and Active Learning Based Classification (Student Abstract)
Novel intent class detection is an important problem in real world scenario
for conversational agents for continuous interaction. Several research works
have been done to detect novel intents in a mono-lingual (primarily English)
texts and images. But, current systems lack an end-to-end universal framework
to detect novel intents across various different languages with less human
annotation effort for mis-classified and system rejected samples. This paper
proposes NIDAL (Novel Intent Detection and Active Learning based
classification), a semi-supervised framework to detect novel intents while
reducing human annotation cost. Empirical results on various benchmark datasets
demonstrate that this system outperforms the baseline methods by more than 10%
margin for accuracy and macro-F1. The system achieves this while maintaining
overall annotation cost to be just ~6-10% of the unlabeled data available to
the system.Comment: AAAI 2023 Student Abstrac
Understanding Psycholinguistic Behavior of predominant drunk texters in Social Media
In the last decade, social media has evolved as one of the leading platform
to create, share, or exchange information; it is commonly used as a way for
individuals to maintain social connections. In this online digital world,
people use to post texts or pictures to express their views socially and create
user-user engagement through discussions and conversations. Thus, social media
has established itself to bear signals relating to human behavior. One can
easily design user characteristic network by scraping through someone's social
media profiles. In this paper, we investigate the potential of social media in
characterizing and understanding predominant drunk texters from the perspective
of their social, psychological and linguistic behavior as evident from the
content generated by them. Our research aims to analyze the behavior of drunk
texters on social media and to contrast this with non-drunk texters. We use
Twitter social media to obtain the set of drunk texters and non-drunk texters
and show that we can classify users into these two respective sets using
various psycholinguistic features with an overall average accuracy of 96.78%
with very high precision and recall. Note that such an automatic classification
can have far-reaching impact - (i) on health research related to addiction
prevention and control, and (ii) in eliminating abusive and vulgar contents
from Twitter, borne by the tweets of drunk texters.Comment: 6 pages, 8 Figures, ISCC 2018 Workshops - ICTS4eHealth 201
Intent Identification and Entity Extraction for Healthcare Queries in Indic Languages
Scarcity of data and technological limitations for resource-poor languages in
developing countries like India poses a threat to the development of
sophisticated NLU systems for healthcare. To assess the current status of
various state-of-the-art language models in healthcare, this paper studies the
problem by initially proposing two different Healthcare datasets, Indian
Healthcare Query Intent-WebMD and 1mg (IHQID-WebMD and IHQID-1mg) and one real
world Indian hospital query data in English and multiple Indic languages
(Hindi, Bengali, Tamil, Telugu, Marathi and Gujarati) which are annotated with
the query intents as well as entities. Our aim is to detect query intents and
extract corresponding entities. We perform extensive experiments on a set of
models in various realistic settings and explore two scenarios based on the
access to English data only (less costly) and access to target language data
(more expensive). We analyze context specific practical relevancy through
empirical analysis. The results, expressed in terms of overall F1 score show
that our approach is practically useful to identify intents and entities
Long Dialog Summarization: An Analysis
Dialog summarization has become increasingly important in managing and
comprehending large-scale conversations across various domains. This task
presents unique challenges in capturing the key points, context, and nuances of
multi-turn long conversations for summarization. It is worth noting that the
summarization techniques may vary based on specific requirements such as in a
shopping-chatbot scenario, the dialog summary helps to learn user preferences,
whereas in the case of a customer call center, the summary may involve the
problem attributes that a user specified, and the final resolution provided.
This work emphasizes the significance of creating coherent and contextually
rich summaries for effective communication in various applications. We explore
current state-of-the-art approaches for long dialog summarization in different
domains and benchmark metrics based evaluations show that one single model does
not perform well across various areas for distinct summarization tasks
MatSciRE: Leveraging Pointer Networks to Automate Entity and Relation Extraction for Material Science Knowledge-base Construction
Material science literature is a rich source of factual information about
various categories of entities (like materials and compositions) and various
relations between these entities, such as conductivity, voltage, etc.
Automatically extracting this information to generate a material science
knowledge base is a challenging task. In this paper, we propose MatSciRE
(Material Science Relation Extractor), a Pointer Network-based encoder-decoder
framework, to jointly extract entities and relations from material science
articles as a triplet (). Specifically, we target
the battery materials and identify five relations to work on - conductivity,
coulombic efficiency, capacity, voltage, and energy. Our proposed approach
achieved a much better F1-score (0.771) than a previous attempt using
ChemDataExtractor (0.716). The overall graphical framework of MatSciRE is shown
in Fig 1. The material information is extracted from material science
literature in the form of entity-relation triplets using MatSciRE
Understanding psycholinguistic behavior of predominant drunk texters in social media
In the last decade, social media has evolved as one of the leading platform to create, share, or exchange information; it is commonly used as a way for individuals to maintain social connections. In this online digital world, people use to post texts or pictures to express their views socially and create user-user engagement through discussions and conversa
Novel Intent Detection and Active Learning Based Classification (Student Abstract)
Novel intent class detection is an important problem in real world scenario for conversational agents for continuous interaction. Several research works have been done to detect novel intents in a mono-lingual (primarily English) texts and
images. But, current systems lack an end-to-end universal framework to detect novel intents across various different languages with less human annotation effort for mis-classified and system rejected samples. This paper proposes
NIDAL (Novel Intent Detection and Active Learning based
classification), a semi-supervised framework to detect novel
intents while reducing human annotation cost. Empirical results on various benchmark datasets demonstrate that this system outperforms the baseline methods by more than 10%
margin for accuracy and macro-F1. The system achieves this while maintaining overall annotation cost to be just ~6-10% of the unlabeled data available to the system
Fine-grained Intent Classification in the Legal Domain
A law practitioner has to go through a lot of long legal case proceedings. To
understand the motivation behind the actions of different parties/individuals
in a legal case, it is essential that the parts of the document that express an
intent corresponding to the case be clearly understood. In this paper, we
introduce a dataset of 93 legal documents, belonging to the case categories of
either Murder, Land Dispute, Robbery, or Corruption, where phrases expressing
intent same as the category of the document are annotated. Also, we annotate
fine-grained intents for each such phrase to enable a deeper understanding of
the case for a reader. Finally, we analyze the performance of several
transformer-based models in automating the process of extracting intent phrases
(both at a coarse and a fine-grained level), and classifying a document into
one of the possible 4 categories, and observe that, our dataset is challenging,
especially in the case of fine-grained intent classification.Comment: 4 pages, 7 tables, 1 figure, appeared in the AAAI-22 workshop on
Scientific Document Understandin