Search CORE

5,590 research outputs found

Harnessing Deep Learning Techniques for Text Clustering and Document Categorization

Author: Kancherla Gangadhara Rao
Paladugu Rama Krishna
Publication venue: Auricle Global Society of Education and Research
Publication date: 20/09/2023
Field of study

This research paper delves into the realm of deep text clustering algorithms with the aim of enhancing the accuracy of document classification. In recent years, the fusion of deep learning techniques and text clustering has shown promise in extracting meaningful patterns and representations from textual data. This paper provides an in-depth exploration of various deep text clustering methodologies, assessing their efficacy in improving document classification accuracy. Delving into the core of deep text clustering, the paper investigates various feature representation techniques, ranging from conventional word embeddings to contextual embeddings furnished by BERT and GPT models.By critically reviewing and comparing these algorithms, we shed light on their strengths, limitations, and potential applications. Through this comprehensive study, we offer insights into the evolving landscape of document analysis and classification, driven by the power of deep text clustering algorithms.Through an original synthesis of existing literature, this research serves as a beacon for researchers and practitioners in harnessing the prowess of deep learning to enhance the accuracy of document classification endeavors

International Journal on Recent and Innovation Trends in Computing and Communication

Workshop on Extracting and Using Constructions in Computational Linguistics

Author: Knutsson Ola
Sahlgren Magnus
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Recommended from our members

OBOME - Ontology based opinion mining in UBIPOL

Author: Husani M
Ko A
Kocyigit A
Lee H
Tapucu D
Publication venue: Brunel University
Publication date: 01/01/2012
Field of study

Ontologies have a special role in the UBIPOL system, they help to structure the policy related context, provide conceptualization for policy domain and use in the opinion mining process. In this work we presented a system called Ontology Based Opinion Mining Engine (OBOME) for analyzing a domain-specific opinion corpus by first assisting the user with the creation of a domain ontology from the corpus. We determined the polarity of opinion on the various domain aspects. In the former step, the policy domain aspect has are identified (namely which policy category is represented by the concept). This identification is supported by the policy modelling ontology, which describe the most important policy – related classes and structure. Then the most informative documents from the corpus are extracted and asked the user to create a set of aspects and related keywords using these documents. In the latter step, we used the corpus specific ontology to model the domain and extracted aspect-polarity associations using grammatical dependencies between words. Later, summarized results are shown to the user to analyze and store. Finally, in an offline process policy modeling ontology is updated

Brunel University Research Archive

Concept Based Labeling of Text Documents Using Support Vector Machine

Author: K.Nithya, M.Saranya, C.R.Dhivyaa
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/03/2014
Field of study

Classification plays a vital role in many information management and retrieval tasks . Text classification uses labeled training data to learn the classification system and then automatically classifies the remaining text using the lear ned system. Classification follows various techniques such as text processing, feature extraction, feature vector construction and final classification. The proposed mining model consists of sentence - based concept analysis, document - based concept analysis, corpus - based concept - analysis, and concept - based similarity measure. The proposed model can efficiently find significant matching concepts between documents, according to the semantics of their sentences. The similarity between documents is calculate d bas ed on a n similarity measure. Then we analyze the term that contributes to the sentence semantics on the sentence, document, and corpus levels rather than the traditional analysis of the document only. With the extracted feature vector for each new document, Support Vector Machine (SVM) algorithm is applied for document classification. The approach enhances the text classification accuracy

International Journal on Recent and Innovation Trends in Computing and Communication