Search CORE

11 research outputs found

Novel Intent Detection and Active Learning Based Classification (Student Abstract)

Author: Mullick Ankan
Publication venue
Publication date: 22/02/2023
Field of study

Novel intent class detection is an important problem in real world scenario for conversational agents for continuous interaction. Several research works have been done to detect novel intents in a mono-lingual (primarily English) texts and images. But, current systems lack an end-to-end universal framework to detect novel intents across various different languages with less human annotation effort for mis-classified and system rejected samples. This paper proposes NIDAL (Novel Intent Detection and Active Learning based classification), a semi-supervised framework to detect novel intents while reducing human annotation cost. Empirical results on various benchmark datasets demonstrate that this system outperforms the baseline methods by more than 10% margin for accuracy and macro-F1. The system achieves this while maintaining overall annotation cost to be just ~6-10% of the unlabeled data available to the system.Comment: AAAI 2023 Student Abstrac

arXiv.org e-Print Archive

Understanding Psycholinguistic Behavior of predominant drunk texters in Social Media

Author: Bahety Sudhanshu
Dhamnani Sunny
Ghosh Surjya
Kumar Anil
Maity Suman Kalyan
Mukherjee Animesh
Mullick Ankan
Publication venue
Publication date: 28/05/2018
Field of study

In the last decade, social media has evolved as one of the leading platform to create, share, or exchange information; it is commonly used as a way for individuals to maintain social connections. In this online digital world, people use to post texts or pictures to express their views socially and create user-user engagement through discussions and conversations. Thus, social media has established itself to bear signals relating to human behavior. One can easily design user characteristic network by scraping through someone's social media profiles. In this paper, we investigate the potential of social media in characterizing and understanding predominant drunk texters from the perspective of their social, psychological and linguistic behavior as evident from the content generated by them. Our research aims to analyze the behavior of drunk texters on social media and to contrast this with non-drunk texters. We use Twitter social media to obtain the set of drunk texters and non-drunk texters and show that we can classify users into these two respective sets using various psycholinguistic features with an overall average accuracy of 96.78% with very high precision and recall. Note that such an automatic classification can have far-reaching impact - (i) on health research related to addiction prevention and control, and (ii) in eliminating abusive and vulgar contents from Twitter, borne by the tweets of drunk texters.Comment: 6 pages, 8 Figures, ISCC 2018 Workshops - ICTS4eHealth 201

arXiv.org e-Print Archive

Intent Identification and Entity Extraction for Healthcare Queries in Indic Languages

Author: Chaitanya G Sai
Goyal Pawan
Mondal Ishani
Mullick Ankan
Raghav R
Ray Sourjyadip
Publication venue
Publication date: 19/02/2023
Field of study

Scarcity of data and technological limitations for resource-poor languages in developing countries like India poses a threat to the development of sophisticated NLU systems for healthcare. To assess the current status of various state-of-the-art language models in healthcare, this paper studies the problem by initially proposing two different Healthcare datasets, Indian Healthcare Query Intent-WebMD and 1mg (IHQID-WebMD and IHQID-1mg) and one real world Indian hospital query data in English and multiple Indic languages (Hindi, Bengali, Tamil, Telugu, Marathi and Gujarati) which are annotated with the query intents as well as entities. Our aim is to detect query intents and extract corresponding entities. We perform extensive experiments on a set of models in various realistic settings and explore two scenarios based on the access to English data only (less costly) and access to target language data (more expensive). We analyze context specific practical relevancy through empirical analysis. The results, expressed in terms of overall F1 score show that our approach is practically useful to identify intents and entities

arXiv.org e-Print Archive

Long Dialog Summarization: An Analysis

Author: Bhowmick Ayan Kumar
Dey Prasenjit
Ganguly Niloy
Goyal Pawan
Kokku Ravi
Mullick Ankan
R Raghav
Publication venue
Publication date: 26/02/2024
Field of study

Dialog summarization has become increasingly important in managing and comprehending large-scale conversations across various domains. This task presents unique challenges in capturing the key points, context, and nuances of multi-turn long conversations for summarization. It is worth noting that the summarization techniques may vary based on specific requirements such as in a shopping-chatbot scenario, the dialog summary helps to learn user preferences, whereas in the case of a customer call center, the summary may involve the problem attributes that a user specified, and the final resolution provided. This work emphasizes the significance of creating coherent and contextually rich summaries for effective communication in various applications. We explore current state-of-the-art approaches for long dialog summarization in different domains and benchmark metrics based evaluations show that one single model does not perform well across various areas for distinct summarization tasks

arXiv.org e-Print Archive

MatSciRE: Leveraging Pointer Networks to Automate Entity and Relation Extraction for Material Science Knowledge-base Construction

Author: Bhattacharjee Satadeep
Chaitanya G Sai
Ghosh Akash
Ghui Samir
Goyal Pawan
Lee Seung-Cheol
Mullick Ankan
Nayak Tapas
Publication venue
Publication date: 18/01/2024
Field of study

Material science literature is a rich source of factual information about various categories of entities (like materials and compositions) and various relations between these entities, such as conductivity, voltage, etc. Automatically extracting this information to generate a material science knowledge base is a challenging task. In this paper, we propose MatSciRE (Material Science Relation Extractor), a Pointer Network-based encoder-decoder framework, to jointly extract entities and relations from material science articles as a triplet (

entity1, relation, entity2

). Specifically, we target the battery materials and identify five relations to work on - conductivity, coulombic efficiency, capacity, voltage, and energy. Our proposed approach achieved a much better F1-score (0.771) than a previous attempt using ChemDataExtractor (0.716). The overall graphical framework of MatSciRE is shown in Fig 1. The material information is extracted from material science literature in the form of entity-relation triplets using MatSciRE

arXiv.org e-Print Archive

Understanding psycholinguistic behavior of predominant drunk texters in social media

Author: Bahety S. (Sudhansu)
Dhamnani S. (Sunny)
Ghosh S. (Surjya)
Kumar A. (Anil)
Maity S.K. (Suman Kalyan)
Mukherjee A. (Animesh)
Mullick A. (Ankan)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/06/2018
Field of study

CWI's Institutional Repository

Novel Intent Detection and Active Learning Based Classification (Student Abstract)

Author: Mullick Ankan
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 06/09/2023
Field of study

Association for the Advancement of Artificial Intelligence: AAAI Publications

Fine-grained Intent Classification in the Legal Domain

Author: Kapadnis Manav Nitin
Mullick Ankan
Nandy Abhilash
Patnaik Sohan
Raghav R
Publication venue
Publication date: 06/05/2022
Field of study

A law practitioner has to go through a lot of long legal case proceedings. To understand the motivation behind the actions of different parties/individuals in a legal case, it is essential that the parts of the document that express an intent corresponding to the case be clearly understood. In this paper, we introduce a dataset of 93 legal documents, belonging to the case categories of either Murder, Land Dispute, Robbery, or Corruption, where phrases expressing intent same as the category of the document are annotated. Also, we annotate fine-grained intents for each such phrase to enable a deeper understanding of the case for a reader. Finally, we analyze the performance of several transformer-based models in automating the process of extracting intent phrases (both at a coarse and a fine-grained level), and classifying a document into one of the possible 4 categories, and observe that, our dataset is challenging, especially in the case of fine-grained intent classification.Comment: 4 pages, 7 tables, 1 figure, appeared in the AAAI-22 workshop on Scientific Document Understandin

arXiv.org e-Print Archive